Emotech (United Kingdom)

corporate_fareorganization

Emotech (United Kingdom)

- Funding / Projects
  (2)

2 Projects, page 1 of 1

CiViL: Common-sense- and Visually-enhanced natural Language generation
assignment_turned_inProject2020 - 2023Partners:Napier University, Edinburgh Napier University, Emotech (United Kingdom), Edinburgh Napier University, Emotech +1 partners
Napier University,
Edinburgh Napier University,
Emotech (United Kingdom),
Edinburgh Napier University,
Emotech,
Emotech
Funder: UK Research and Innovation Project Code: EP/T014598/1
Funder Contribution: 280,059 GBP
One of the most compelling problems in Artificial Intelligence is to create computational agents capable of interacting in real-world environments using natural language. Computational agents such as robots can offer multiple benefits to society, for instance, they can be used to look after the ageing population, act as companions, can be used for skills training or even provide assistance in public spaces. These are extremely challenging tasks due to their complex interdisciplinary nature, which spans across several fields including Natural Language Generation, engineering, computer vision, and robotics. Communication through language is the most vital and natural way of interaction. Humans are able to effectively communicate with each other using natural language, utilising common-sense knowledge and by making inferences about other people's backgrounds based on previous interactions with them. At the same time, they can successfully describe their surroundings, even when encountering unknown entities and object. For decades, researchers have tried to recreate the way humans communicate through natural language and although there are major breakthroughs during recent years (such as Apple's Siri or Amazon's Alexa), Natural Language Generation systems still lack the ability to reason, exploit common-sense knowledge, and utilise multi-modal information from a variety of sources such as knowledge bases, images, and videos. This project aims to develop a framework for common-sense- and visually- enhanced Natural Language Generation that can enable natural real-time communication between humans and artificial agents such as robots to enable effective collaboration between humans and robots. Human-Robot Interaction poses additional challenges to Natural Language Generation due to uncertainty derived from the dynamic environments and the non-deterministic fashion of interaction. For instance, the viewpoint of a situated robot will change when the robot moves and hence its representation of the world, which will result in failure of current state-of-art methods, which are not able to adapt to changing environments. The project aims to investigate methods for linking various modalities, taking into account their dynamic nature. To achieve natural, efficient and intuitive communication capabilities, agents will also need to acquire human-like abilities in synthesising knowledge and expression. The conditions under which external knowledge bases (such as Wikipedia) can be used to enhance natural language generation still have to be explored as well as whether existing knowledge bases are useful for language generation. The novel ways to integrate multi-modal data for language generation will lead to more robust and efficient interactions and will have an impact on natural language generation, social robotics, computer vision, and related fields. This might, in turn, spawn entirely novel applications, such as explaining exact procedures for e-health treatments and enhance tutoring systems for educational purposes.
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
SpeechWave
assignment_turned_inProject2018 - 2022Partners:SRI, KCL, Emotech Ltd, SRI INTERNATIONAL, University of California, Berkeley +10 partners
SRI,
KCL,
Emotech Ltd,
SRI INTERNATIONAL,
University of California, Berkeley,
University of California, Berkeley,
BBC,
BBC,
Quorate Technology Ltd,
Quorate Technology Limited,
British Broadcasting Corporation (United Kingdom),
SRI INTERNATIONAL,
British Broadcasting Corporation - BBC,
Emotech (United Kingdom),
SRI
Funder: UK Research and Innovation Project Code: EP/R012067/1
Funder Contribution: 734,106 GBP
Speech recognition has made major advances in the past few years. Error rates have been reduced by more than half on standard large-scale tasks such as Switchboard (conversational telephone speech), MGB (multi-genre broadcast recordings), and AMI (multiparty meetings). These research advances have quickly translated into commercial products and services: speech-based applications and assistants such as such as Apple's Siri, Amazon's Alexa, and Google voice search have become part of daily life for many people. Underpinning the improved accuracy of these systems are advances in acoustic modelling, with deep learning having had an outstanding influence on the field. However, speech recognition is still very fragile: it has been successfully deployed in specific acoustic conditions and task domains - for instance, voice search on a smart phone - and degrades severely when the conditions change. This is because speech recognition is highly vulnerable to additive noise caused by multiple acoustic sources, and to reverberation. In both cases, acoustic conditions which have essentially no effect on the accuracy of human speech recognition can have a catastrophic impact on the accuracy of a state-of-the-art automatic system. A reason for such brittleness is the lack of a strong model for acoustic robustness. Robustness is usually addressed through multi-condition training, in which the training set comprises speech examples across the many required acoustic conditions, often constructed by mixing speech with noise at different signal-to-noise ratios. For a limited set of acoustic conditions these techniques can work well, but they are inefficient and do not offer a model of multiple acoustic sources, nor do they factorise the causes of variability. For instance, the best reported speech recognition results for transcription of the AMI corpus test set using single distant microphone recordings is about 38% word error rate (for non-overlapped speech), compared to about 5% error rate for human listeners. In the past few years there have been several approaches that have tried to address these problems: explicitly learning to separate multiple sources; factorised acoustic models using auxiliary features; and learned spectral masks for multi-channel beam-forming. SpeechWave will pursue an alternative approach to robust speech recognition: The development of acoustic models which learn directly from the speech waveform. The motivation to operate directly in the waveform domain arises from the insight that redundancy in speech signals is highly likely to be a key factor in the robustness of human speech recognition. Current approaches to speech recognition separate non-adaptive signal processing components from the adaptive acoustic model, and in so doing lose the redundancy - and, typically, information such as the phase - present in the speech waveform. Waveform models are particularly exciting as they combine the previously distinct signal processing and acoustic modelling components. In SpeechWave, we shall explore novel waveform-based convolutional and recurrent networks which combine speech enhancement and recognition in a factorised way, and approaches based on kernel methods and on recent research advances in sparse signal processing and speech perception. Our research will be evaluated on standard large-scale speech corpora. In addition we shall participate in, and organise, international challenges to assess the performance of speech recognition technologies. We shall also validate our technologies in practice, in the context of the speech recognition challenges faced by our project partners BBC, Emotech, Quorate, and SRI.
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu

Emotech (United Kingdom)

Emotech (United Kingdom)

2 Projects, page 1 of 1

CiViL: Common-sense- and Visually-enhanced natural Language generation

SpeechWave

Loading