Laboratoire de Langues & Civilisations à Tradition Orale

corporate_fareorganization

Country: France

Laboratoire de Langues & Civilisations à Tradition Orale

- Funding / Projects
  (6)

6 Projects, page 1 of 2

Sketch grammars of Nemi and Paicî and alignment typology of the North of New Caledonia
assignment_turned_inProject2024 - 2025Partners:Laboratoire de Langues & Civilisations à Tradition Orale, Laboratoire de Langues & Civilisations à Tradition Orale
Laboratoire de Langues & Civilisations à Tradition Orale,
Laboratoire de Langues & Civilisations à Tradition Orale
Funder: Swiss National Science Foundation Project Code: 217641
Funder Contribution: 116,600
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
ABLA (Atlas of the Balkan Linguistic Area)
assignment_turned_inProjectFrom 2022Partners:Russian Academy of Sciences / Institute for Linguistic Research, Laboratoire de Langues & Civilisations à Tradition Orale, Structure et Dynamique des Langues, Structure et Dynamique des Langues, Laboratoire de Langues & Civilisations à Tradition Orale
Russian Academy of Sciences / Institute for Linguistic Research,
Laboratoire de Langues & Civilisations à Tradition Orale,
Structure et Dynamique des Langues,
Structure et Dynamique des Langues,
Laboratoire de Langues & Civilisations à Tradition Orale
Funder: French National Research Agency (ANR) Project Code: ANR-21-CE27-0020
Funder Contribution: 216,608 EUR
The Atlas of the Balkan Linguistic Area project will build an online database of language contact phenomena as attested in the Balkan languages and contribute to theoretical discussions in areal linguistics. ABLA will consist of 100+ phonological, morpho-syntactic, semantic, and lexical features, drawing on a linguistic questionnaire to be designed based on the Russian team’s expertise with atlases. Each feature will be matched to a map covering 70+ localities across all Balkan countries. Each map will be accompanied by a chapter co-authored by the project contributors. The online database will be developed by the French team and hosted by HumaNum via the Pangloss Collection with a mirror site in Russia. ABLA will also be published by an international publisher. ABLA will not only be the first online database for the Balkans, an area shaped by multilingualism in forms that are rapidly disappearing, but will further serve as an example for other linguistic areas in the world.
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
Autogramm (Induction of descriptive grammars from annotated corpora)
assignment_turned_inProjectFrom 2022Partners:MoDyCo, Laboratoire de Langues & Civilisations à Tradition Orale, LISN Laboratoire Interdisciplinaire des Sciences du Numérique, Centre de Recherche Inria Nancy - Grand Est, Laboratoire de Langues & Civilisations à Tradition Orale +1 partners
MoDyCo,
Laboratoire de Langues & Civilisations à Tradition Orale,
LISN Laboratoire Interdisciplinaire des Sciences du Numérique,
Centre de Recherche Inria Nancy - Grand Est,
Laboratoire de Langues & Civilisations à Tradition Orale,
MoDyCo
Funder: French National Research Agency (ANR) Project Code: ANR-21-CE38-0017
Funder Contribution: 525,941 EUR
The aim of this project is to automate, insofar as possible, the extraction of descriptive grammars and grammatical descriptions from annotated corpora, for the purpose of linguistic and typological studies. We aim for descriptions which 1) highlight the main properties of the corpus (and by extension the language or variety the corpus represents); 2) are easily understandable to a linguist; 3) can be visualized by texts, diagrams, or tables, as well as grammatical databases generally oriented towards comparative and typological studies; and 4) whose size and precision can be adapted to the user’s requirements. Because these grammatical descriptions are induced from a corpus, they contain quantitative information associated with each observation made on that corpus, as well as relevant examples extracted from it. Our grammatical descriptions will be extracted from two types of corpora which are available for a wide array of languages and which contain rich information for inferring their structural properties: - Treebanks from the Universal Dependencies [UD] treebanks collection (a hundred languages, 12M words). A treebank is an annotated corpus where each sentence is associated with a syntactic tree; UD is based on dependency syntax, where words are linked by dependency relations. - The Pangloss and CorpOrAn collection (hosted by the Lacito and Llacan laboratories) are two of the few international archives aimed at preserving the linguistic heritage of the world. They contain corpora of under-resourced languages collected by field linguists which represent the diversity of linguistic structures beyond the realm of European languages. Such corpora are structured as interlinear glossed texts [IGTs]: oral corpora transcribed, translated, segmented into morphemes and glossed. Several languages from these archives will be used for the manual and automatic enrichment of IGTs with syntactic annotations. Our main objectives are the following: 1. Extract from a corpus a set of grammatical patterns or constructions a. taking into account the frequency of the observed phenomena, b. through an inductive methodology, allowing the discovery of patterns that do not necessarily appear in existing grammars; 2. Order the list of patterns in terms of relevance; 3. Compare the sets of patterns observed in the different corpora, representing typologically diverse languages, in order to build typological generalizations about differences and structural identity between sets of patterns; 4. Propose an efficient processing chain for the simultaneous development of a treebank and a grammar. 5. Develop treebanks and descriptive grammars for a dozen languages thanks to our processing chain. This development work will focus on under-resourced (non-Indo-European) languages (diverse in terms of typological profile, geographic spread, and genetic affiliation). Some of the languages we will study are endangered (Tuwari and Zaar) or threatened (Salar, Sungwadia, Ye’kawana). 6. Compare languages through observations taking into account the frequency of phenomena, which lead us to a quantitative and inductive typology, i.e. taking into account the specificities of the language and the properties induced by the analysis of the data. The project brings together a variety of specialists, including field linguists with corpora of the languages they specialize in, of which they would like to develop new descriptions; linguists interested in language comparison and typology; specialists in the development of annotated corpora and more particularly syntactic treebanks, with extensive knowledge of the development of formal grammars; and finally, researchers in natural language processing who are interested in machine learning, graph rewriting and the development of tools for the development of linguistic resources and the study of languages.
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
DeepTypo (Probing Neural Representations for Typological Signal)
assignment_turned_inProjectFrom 2023Partners:Laboratoire de Langues & Civilisations à Tradition Orale, CNRS, Laboratoire Ligérien de Linguistique, Laboratoire Ligérien de Linguistique, INSHS +3 partners
Laboratoire de Langues & Civilisations à Tradition Orale,
CNRS,
Laboratoire Ligérien de Linguistique,
Laboratoire Ligérien de Linguistique,
INSHS,
Laboratoire de Langues & Civilisations à Tradition Orale,
Laboratoire Interdisciplinaire des Sciences du Numérique,
LLF
Funder: French National Research Agency (ANR) Project Code: ANR-23-CE38-0003
Funder Contribution: 460,009 EUR
In the last few years, neural models have allowed spectacular progress in natural language processing (NLP). The DeepTypo project proposes to use multilingual models of speech to design methods for automatically extracting, from audio recordings, typological information useful for language documentation and research (phonological and morphosyntactic complexity indices, similarities between languages…). Based on a collaboration between linguists and NLP researchers, the DeepTypo project sits squarely in the space of digital humanities by addressing fundamental questions of both communities. It will help linguists in their work of documenting and analyzing languages, especially “rare” or “poorly endowed” languages, by providing them with new tools and methods that will allow them, for example, to bring out new information on similarities between languages. Beyond the “tool development” aspect, the DeepTypo project aims, above all, at showing that the representations at the heart of neural networks can be used to answer fundamental questions in linguistic, by taking, as an example, current issues in creolistics (the study of creoles) and dialectology of Sino-Tibetan languages. Extracting typological information, the core of the DeepTypo project, will also contribute to the identification of the limits of fine-tuning. This approach has made it possible to develop, at low cost, NLP systems for several languages and many tasks and is often presented today as "THE" solution to all NLP problems. The identification of linguistic features captured by neural networks will allow us to verify if this is indeed the case: if a model is, for example, not able to detect and represent the tones of a language, it is more than likely that it cannot be used to develop a system for tonal languages. To achieve this ambitious goal, we will use neural representation analysis methods to interpret and understand the decisions of neural networks and will develop them along four original axes: 1. Based on the collaboration with the different partners of the project, we will try to identify richer features than those considered in the state of the art: if the existing works have focused on “simple” features (speaker gender, language of the utterance, ...), we will also consider information related to the diversity of the languages and to the linguistic characteristics of these languages (phonemic inventory, identification of tonal languages, ...). 2. In addition to existing analysis methods (e.g. linguistic probes), we will develop new methods to measure similarity between languages. Again, close collaboration between linguists and NLP researchers will be essential to define a linguistically relevant similarity (or similarities). 3. We will apply our methods to the 230 languages of the Pangloss collection (an archive of rare languages managed by LACITO) and to 15 creoles (collected mainly by LLL). These large-scale experiments will allow us to test state-of-the-art pre-trained models on languages with a wide variety of linguistic features rarely considered in NLP work. 4. We will apply these methods to language documentation support tasks, an application that has, until now, never been considered.
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
DiasCo-Tib (Diasporic Convergences: a case-study of Tibetan refugees)
assignment_turned_inProjectFrom 2024Partners:Laboratoire de Langues & Civilisations à Tradition Orale, University of Zurich, Université de Tampere, Centre d'Etudes en Sciences Sociales sur les Mondes Africains, Américains et Asiatiques, Laboratoire de Langues & Civilisations à Tradition Orale +3 partners
Laboratoire de Langues & Civilisations à Tradition Orale,
University of Zurich,
Université de Tampere,
Centre d'Etudes en Sciences Sociales sur les Mondes Africains, Américains et Asiatiques,
Laboratoire de Langues & Civilisations à Tradition Orale,
Institut Français de Recherche sur l’Asie de l’Est,
Centre d'Etudes en Sciences Sociales sur les Mondes Africains, Américains et Asiatiques,
Université Laval
Funder: French National Research Agency (ANR) Project Code: ANR-23-CE41-0017
Funder Contribution: 494,623 EUR
DiasCo-Tib proposes to analyse various patterns of linguistic, spatial and social convergence at a “diasporic moment,” i.e. a critical juncture of reactivation and reconfiguration of a diaspora, as it is unfolding. The research will be based on the case of Tibetan refugees, who are currently undergoing such a “diasporic moment” with the anticipated demise of their spiritual leader, the Dalai Lama (b. 1935). Recent and fast-growing on-migratory trends, from South Asia towards Europe and North America, already lead to a large-scale spatial reconfiguration, with France becoming a major hub in the multipolar Tibetan diasporic network. The project’s central hypothesis is that in the context of a diasporic moment, increased spatial dispersion, paradoxically, triggers enhanced processes of social convergence. In order to produce a comprehensive analysis of diasporic convergence processes, DiasCo-Tib will mobilise an interdisciplinary team to study concomitant social phenomena and evaluate their degree of interrelatedness in the domains of language(s) and linguistic practices; social and economic translocal networks; forms of collective representation (in political, civic or artistic spheres); changing gender roles; and religious practices. Multi-sited research will account for the circulation of norms and social practices, taking into account local and cross-border forms of integration and differentiation as well as ongoing shifts in Tibetan refugees’ inscriptions in host societies. Along expected convergences, lines of segmentation will be observed and analysed as they crystallise to reconfigure the common yet plural linguistic and social practices of the Tibetan diaspora. The chosen case study will thus shed light on multi-dimensional processes of diasporisation as they are experienced and enacted by individuals and communities in their everyday lives and particular biographical trajectories.
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu

chevron_left
1
2
chevron_right

Laboratoire de Langues & Civilisations à Tradition Orale

Laboratoire de Langues & Civilisations à Tradition Orale

Funder

6 Projects, page 1 of 2

Sketch grammars of Nemi and Paicî and alignment typology of the North of New Caledonia

ABLA (Atlas of the Balkan Linguistic Area)

Autogramm (Induction of descriptive grammars from annotated corpora)

DeepTypo (Probing Neural Representations for Typological Signal)

DiasCo-Tib (Diasporic Convergences: a case-study of Tibetan refugees)

Loading