Information Retrieval Facility

corporate_fareorganization

Information Retrieval Facility

- Funding / Projects
  (2)

2 Projects, page 1 of 1

GATE Cloud Exploratory: Adapting the General Architecture for Text Engineering to Cloud Computing
assignment_turned_inProject2011 - 2011Partners:KCL, Institute of Social Psychiatry, Information Retrieval Facility, Institute of Social Psychiatry, [no title available] +6 partners
KCL,
Institute of Social Psychiatry,
Information Retrieval Facility,
Institute of Social Psychiatry,
[no title available],
University of Sheffield,
OXFORD INTERNET INSTITUTE,
Information Retrieval Facility,
University of Oxford,
University of Sheffield,
OXFORD INTERNET INSTITUTE
Funder: UK Research and Innovation Project Code: EP/I034092/1
Funder Contribution: 71,677 GBP
When you plug your fridge into the mains electricity supply you don't worry about all the technology sitting behind the wall socket -- it just works. Cloud computing is starting to supply IT in a similar fashion. No more worrying about backups, no more hours spent configuring a new or repaired machine -- just plug into the network, fire up your web browser and away you go.Researchers have tougher and more specialised IT needs than most, so to realise the same ease of use that the cloud now provides for email or word processing requires work in several areas. One of these areas is to adapt existing established research tools to the cloud, and that is what this project will do. Our tool is called GATE, a General Architecture for Text Engineering. Over the last decade the UK's GATE system has become a world-leader for research and development of text mining algorithms.Text has become a more and more important communication method in recent decades. Our children are now spending over 6 hours in front of screens; our evenings often include sessions on Facebook or writing email to friends and relatives. When we interact with the corporations and governmental organisations whose infrastructure and services underpin our daily lives, we fill in forms or write emails. When we want to publicise our work or share details of our leisure activities we create websites, post Twitter messages or blog entries. Scientists also now use these channels in their work, in addition to publishing in peer-reviewed journals -- a process which has also seen a huge expansion in recent years.This avalanche of the written word has changed many things, not least the way that scientists gather information. For example, a team at the World Health Organisation's cancer research agency recently found the first evidence of a link between particular genetic mutation and the risk of lung cancer in smokers. Their experiments require large amounts of costly laboratory time to test hypotheses, based on samples of mutations in gene sequences from their test subjects. Text mining from previous publications makes it possible for them to reduce this lab time by factoring in probabilities based on association strengths between mutations, environmental factors and active chemicals.A second area that has been revolutionised by new media is customer relations and market research, which are no longer about monitoring the goings on of the corporate call centre. Keeping up to date with the public image of your products or services now means coping with the Twitter firehose (45 million posts per day), the comment sections of consumer review sites, or the point-and-click 'contact us' forms from the company website. To do this by hand is now impossible in the general case: the data volume long ago outstripped the possibility of cost-effective manual monitoring. Text mining provides alternative, automatic methods for dealing with text.GATE provides four systems to support scientists experimenting with new text mining algorithms and developers using text mining in their applications:- GATE Developer: an integrated development environment for language processing components- GATE Embedded: an object library optimised for inclusion in diverse applications- GATE Teamware: a collaborative annotation environment for high volume web-based semantic annotation projects built around a workflow engine- GATE Mmir: (Multi-paradigm Information Management Index and Repository) a massively scaleable multi-paradigm indexWe have identified a need for a particular type of cloud service in our research field and this project will implement it such that there is close to zero barrier to entry for researchers. Based on our preliminary investigative work, we expect to complete a production quality service within this project. In simpler terms - this project will work towards making use of GATE on the cloud more like electric sockets and fridges!
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
A Unified Model of Compositional and Distributional Semantics: Theory and Applications
assignment_turned_inProject2012 - 2016Partners:Metrica, Cambridge Integrated Knowledge Centre, University of Cambridge, Information Retrieval Facility, Information Retrieval Facility +2 partners
Metrica,
Cambridge Integrated Knowledge Centre,
University of Cambridge,
Information Retrieval Facility,
Information Retrieval Facility,
UNIVERSITY OF CAMBRIDGE,
Metrica
Funder: UK Research and Innovation Project Code: EP/I037512/1
Funder Contribution: 345,414 GBP
The notion of meaning is central to many areas of Computer Science, Artificial Intelligence (AI), Linguistics, Philosophy, and Cognitive Science. A formal, mathematical account of the meaning of natural language utterances is crucial to AI, since an understanding of natural language (i.e. languages such as English, German, Chinese etc) is at the heart of much intelligent behaviour. More specifically, Natural Language Processing (NLP) --- the branch of AI concerned with the computer processing, analysis and generation of text --- requires a model of meaning for many of its tasks and applications. There have been two main approaches to modelling the meaning of language in NLP, in order that a computer can gain some "understanding" of the text. The first, the so-called compositional approach, is based on classical ideas from Philosophy and Mathematical Logic. Using a well-known principle from the 19th century logician Frege --- that the meaning of a phrase can be determined from the meanings of its parts and how those parts are combined --- logicians have developed formal accounts of how the meaning of a sentence can be determined from the relations of words in a sentence. This idea culminated famously in Linguistics in the work of Richard Montague in the 1970s. The compositional approach addresses a fundamental problem in Linguistics -- how it is that humans are able to generate an unlimited number of sentences using a limited vocabulary. We would like computers to have a similar capacity also. The second, more recent, approach to modelling meaning in NLP focuses on the meanings of the words themselves. This is the so-called distributional approach to modelling word meanings and is based on the ideas of the "structural" linguists such as Firth from the 1950s. This idea is also sometimes related to Wittenstein's philosophy of "meaning as use". The idea is that the meanings of words can be determined by considering the contexts in which words appear in text. For example, if we take a large amount of text and see which words appear close to the word "dog", and do a similar thing for the word "cat", we will see that the contexts of dog and cat tend to share many words in common (such as walk, run, furry, pet, and so on). Whereas if we see which words appear in the context of the word "television", for example, we will find less overlap with the contexts for "dog". Mathematically we represent the contexts in a vector space, so that word meanings occupy positions in a geometrical space. We would expect to find that "dog" and "cat" are much closer in the space than "dog" and "television", indicating that "dog" and "cat" are closer in meaning than "dog" and "television". The two approaches to meaning can be roughly characterized as follows: the compositional approach is concerned with how meanings combine, but has little to say about the individual meanings of words; the distributional approach is concerned with word meanings, but has little to say about how those meanings combine. Our ambitious proposal is to exploit the strengths of the two approaches, by developing a unified model of distributional and compositional semantics. Our proposal has a central theoretical component, drawing on models of semantics from Theoretical Computer Science and Mathematical Logic. This central component which will inform, be driven by, and evaluated on tasks and applications in NLP and Information Retrieval, and also data drawn from empirical studies in Cognitive Science (the computational study of the mind). Hence we aim to make the following fundamental contributions: 1. advance the theoretical study of meaning in Linguistics, Computer Science and Artificial Intelligence; 2. develop new meaning-sensitive approaches to NLP applications which can be robustly applied to naturally occurring text.
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu

Information Retrieval Facility

Information Retrieval Facility

2 Projects, page 1 of 1

GATE Cloud Exploratory: Adapting the General Architecture for Text Engineering to Cloud Computing

A Unified Model of Compositional and Distributional Semantics: Theory and Applications

Loading