Powered by OpenAIRE graph
Found an issue? Give us feedback

Laboratoire de Langues & Civilisations à Tradition Orale

Country: France

Laboratoire de Langues & Civilisations à Tradition Orale

Funder
Top 100 values are shown in the filters
Results number
arrow_drop_down
6 Projects, page 1 of 2
  • Funder: Swiss National Science Foundation Project Code: 217641
    Funder Contribution: 116,600
    more_vert
  • Funder: French National Research Agency (ANR) Project Code: ANR-21-CE27-0020
    Funder Contribution: 216,608 EUR

    The Atlas of the Balkan Linguistic Area project will build an online database of language contact phenomena as attested in the Balkan languages and contribute to theoretical discussions in areal linguistics. ABLA will consist of 100+ phonological, morpho-syntactic, semantic, and lexical features, drawing on a linguistic questionnaire to be designed based on the Russian team’s expertise with atlases. Each feature will be matched to a map covering 70+ localities across all Balkan countries. Each map will be accompanied by a chapter co-authored by the project contributors. The online database will be developed by the French team and hosted by HumaNum via the Pangloss Collection with a mirror site in Russia. ABLA will also be published by an international publisher. ABLA will not only be the first online database for the Balkans, an area shaped by multilingualism in forms that are rapidly disappearing, but will further serve as an example for other linguistic areas in the world.

    more_vert
  • Funder: French National Research Agency (ANR) Project Code: ANR-21-CE38-0017
    Funder Contribution: 525,941 EUR

    The aim of this project is to automate, insofar as possible, the extraction of descriptive grammars and grammatical descriptions from annotated corpora, for the purpose of linguistic and typological studies. We aim for descriptions which 1) highlight the main properties of the corpus (and by extension the language or variety the corpus represents); 2) are easily understandable to a linguist; 3) can be visualized by texts, diagrams, or tables, as well as grammatical databases generally oriented towards comparative and typological studies; and 4) whose size and precision can be adapted to the user’s requirements. Because these grammatical descriptions are induced from a corpus, they contain quantitative information associated with each observation made on that corpus, as well as relevant examples extracted from it. Our grammatical descriptions will be extracted from two types of corpora which are available for a wide array of languages and which contain rich information for inferring their structural properties: - Treebanks from the Universal Dependencies [UD] treebanks collection (a hundred languages, 12M words). A treebank is an annotated corpus where each sentence is associated with a syntactic tree; UD is based on dependency syntax, where words are linked by dependency relations. - The Pangloss and CorpOrAn collection (hosted by the Lacito and Llacan laboratories) are two of the few international archives aimed at preserving the linguistic heritage of the world. They contain corpora of under-resourced languages collected by field linguists which represent the diversity of linguistic structures beyond the realm of European languages. Such corpora are structured as interlinear glossed texts [IGTs]: oral corpora transcribed, translated, segmented into morphemes and glossed. Several languages from these archives will be used for the manual and automatic enrichment of IGTs with syntactic annotations. Our main objectives are the following: 1. Extract from a corpus a set of grammatical patterns or constructions a. taking into account the frequency of the observed phenomena, b. through an inductive methodology, allowing the discovery of patterns that do not necessarily appear in existing grammars; 2. Order the list of patterns in terms of relevance; 3. Compare the sets of patterns observed in the different corpora, representing typologically diverse languages, in order to build typological generalizations about differences and structural identity between sets of patterns; 4. Propose an efficient processing chain for the simultaneous development of a treebank and a grammar. 5. Develop treebanks and descriptive grammars for a dozen languages thanks to our processing chain. This development work will focus on under-resourced (non-Indo-European) languages (diverse in terms of typological profile, geographic spread, and genetic affiliation). Some of the languages we will study are endangered (Tuwari and Zaar) or threatened (Salar, Sungwadia, Ye’kawana). 6. Compare languages through observations taking into account the frequency of phenomena, which lead us to a quantitative and inductive typology, i.e. taking into account the specificities of the language and the properties induced by the analysis of the data. The project brings together a variety of specialists, including field linguists with corpora of the languages they specialize in, of which they would like to develop new descriptions; linguists interested in language comparison and typology; specialists in the development of annotated corpora and more particularly syntactic treebanks, with extensive knowledge of the development of formal grammars; and finally, researchers in natural language processing who are interested in machine learning, graph rewriting and the development of tools for the development of linguistic resources and the study of languages.

    more_vert
  • Funder: French National Research Agency (ANR) Project Code: ANR-23-CE38-0003
    Funder Contribution: 460,009 EUR

    In the last few years, neural models have allowed spectacular progress in natural language processing (NLP). The DeepTypo project proposes to use multilingual models of speech to design methods for automatically extracting, from audio recordings, typological information useful for language documentation and research (phonological and morphosyntactic complexity indices, similarities between languages…). Based on a collaboration between linguists and NLP researchers, the DeepTypo project sits squarely in the space of digital humanities by addressing fundamental questions of both communities. It will help linguists in their work of documenting and analyzing languages, especially “rare” or “poorly endowed” languages, by providing them with new tools and methods that will allow them, for example, to bring out new information on similarities between languages. Beyond the “tool development” aspect, the DeepTypo project aims, above all, at showing that the representations at the heart of neural networks can be used to answer fundamental questions in linguistic, by taking, as an example, current issues in creolistics (the study of creoles) and dialectology of Sino-Tibetan languages. Extracting typological information, the core of the DeepTypo project, will also contribute to the identification of the limits of fine-tuning. This approach has made it possible to develop, at low cost, NLP systems for several languages and many tasks and is often presented today as "THE" solution to all NLP problems. The identification of linguistic features captured by neural networks will allow us to verify if this is indeed the case: if a model is, for example, not able to detect and represent the tones of a language, it is more than likely that it cannot be used to develop a system for tonal languages. To achieve this ambitious goal, we will use neural representation analysis methods to interpret and understand the decisions of neural networks and will develop them along four original axes: 1. Based on the collaboration with the different partners of the project, we will try to identify richer features than those considered in the state of the art: if the existing works have focused on “simple” features (speaker gender, language of the utterance, ...), we will also consider information related to the diversity of the languages and to the linguistic characteristics of these languages (phonemic inventory, identification of tonal languages, ...). 2. In addition to existing analysis methods (e.g. linguistic probes), we will develop new methods to measure similarity between languages. Again, close collaboration between linguists and NLP researchers will be essential to define a linguistically relevant similarity (or similarities). 3. We will apply our methods to the 230 languages of the Pangloss collection (an archive of rare languages managed by LACITO) and to 15 creoles (collected mainly by LLL). These large-scale experiments will allow us to test state-of-the-art pre-trained models on languages with a wide variety of linguistic features rarely considered in NLP work. 4. We will apply these methods to language documentation support tasks, an application that has, until now, never been considered.

    more_vert
  • Funder: French National Research Agency (ANR) Project Code: ANR-23-CE41-0017
    Funder Contribution: 494,623 EUR

    DiasCo-Tib proposes to analyse various patterns of linguistic, spatial and social convergence at a “diasporic moment,” i.e. a critical juncture of reactivation and reconfiguration of a diaspora, as it is unfolding. The research will be based on the case of Tibetan refugees, who are currently undergoing such a “diasporic moment” with the anticipated demise of their spiritual leader, the Dalai Lama (b. 1935). Recent and fast-growing on-migratory trends, from South Asia towards Europe and North America, already lead to a large-scale spatial reconfiguration, with France becoming a major hub in the multipolar Tibetan diasporic network. The project’s central hypothesis is that in the context of a diasporic moment, increased spatial dispersion, paradoxically, triggers enhanced processes of social convergence. In order to produce a comprehensive analysis of diasporic convergence processes, DiasCo-Tib will mobilise an interdisciplinary team to study concomitant social phenomena and evaluate their degree of interrelatedness in the domains of language(s) and linguistic practices; social and economic translocal networks; forms of collective representation (in political, civic or artistic spheres); changing gender roles; and religious practices. Multi-sited research will account for the circulation of norms and social practices, taking into account local and cross-border forms of integration and differentiation as well as ongoing shifts in Tibetan refugees’ inscriptions in host societies. Along expected convergences, lines of segmentation will be observed and analysed as they crystallise to reconfigure the common yet plural linguistic and social practices of the Tibetan diaspora. The chosen case study will thus shed light on multi-dimensional processes of diasporisation as they are experienced and enacted by individuals and communities in their everyday lives and particular biographical trajectories.

    more_vert
  • chevron_left
  • 1
  • 2
  • chevron_right

Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.

Content report
No reports available
Funder report
No option selected
arrow_drop_down

Do you wish to download a CSV file? Note that this process may take a while.

There was an error in csv downloading. Please try again later.