Loading
Within the span of five short years (2017-2022), the field of Natural Language Processing (NLP) has been deeply transformed by the advances of general-purpose neural architectures, which are both used to learn deep representations for linguistic units and to generate high-quality textual content. These architectures are nowadays ubiquitous in NLP applications; trained at scale, these “large language models” (LLMs) offer multiple services (summarization, writing aids, translation) in one model through human-like conversations and prompting techniques. In this project, we try to analyze the new state of play from the perspective of the machine translation (MT) task and ask two main questions: (a) as LLMs can be trained without any parallel data, they open the perspective of improved MT for multiple language pairs for which such resources are scarce if they exist at all. Can this promise be held, especially for low-resource dialects or regional languages? (b) prompting techniques make it straightforward to inject various types of contextual information that could help a MT system to take context into specific account such as to adapt to a domain, a genre, a style, to a client’s translation memory, to the readers’ language proficiency, etc. Is prompting equally effective for all these situations, assuming good prompts can be generated, or is it hopeless to expect improvements without (instruction) fine-tuning? To address these two questions, project TraLaLaM will also (a) collect data for low-resource languages and use them to extend existing LLMs, (b) develop new testing corpora and associated evaluation strategies.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://beta.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::e169460480cdfcadbf155a7453c38401&type=result"></script>');
-->
</script>