Powered by OpenAIRE graph
Found an issue? Give us feedback

TermITH

Terminology and texts indexation in Human Sciences
Funder: French National Research Agency (ANR)Project code: ANR-12-CORD-0029
Funder Contribution: 710,718 EUR
Description

The collaborative research project TermITH (Terminology and Indexation of Texts in the area of Humanities) merges six French partners : ATILF (Analysis and Natural Language Processing of French Language), INIST (National Institute of Scientific and Technical Information), LINA (Laboratory of Computer Science from Nantes), LIDILEM (Laboratory of Linguistics and Applied Linguistics of native and second languages from Grenoble) and two INRIA Centers (National Institute of research in Computer Science and Automatics), INRIA Nancy Grand-Est and INRIA Saclay. This project deals with information access to textual documents via a full-text indexing which is based on terms which are detected, disambiguated and analyzed. This issue is well-known: the digital age is characterized by a very large quantity of information that has to be indexed to allow access to it, by the growing diversity of the areas and disciplines which entails a more and more frequent interdisciplinary. Text indexing based on terms occurring still is a hot research topic though different approaches have recently provided some good results. These approaches use either occurrences of terms which are detected on the basis of their textual form (projection of controlled vocabularies or structured terminologies using pattern matching, inflection rules, syntagmatic variations like for instance FASTR), or term candidates which result from some automatic terms detection components. All these methodologies require expensive human verification: (1) for indexing: manual checking of the automatically defined indexes or even, complete analysis of documents in order to define the good indexes of these documents, (2) for the automatic terms detection: classification of the very large amount of terms candidates, (3) for the projection of controlled vocabularies or structured terminology: updating of the terminological resources. TermITH’s approach is intended to cross the automatically detected and disambiguated occurrences of terms in texts with available interdisciplinary lexicons and terminological resources to isolate the specific terms for each studied area. Such an approach has two main advantages. First, it limits the human cost for the manual evaluation of indexes of documents and the manual analysis of documents if necessary. This results from the disambiguation and the crossing with interdisciplinary lexicons and terminological resources. Second, it will permit an automatic updating of terminological resources. From the theoretical point of view, TermITH will allow cross-fertilization of disciplines which grow in parallel for the moment: contextual disambiguation, data mining and textual statistics for terms disambiguation; automatic terms detection, terminological resources projection and interdisciplinary lexicons for terms detection and index of them in texts. In the first experimental phase, TermITH actors have chosen to work within a scientific area in which the ambiguity between terminological and general language usage is very high: the humanities. The projected methodology will be tested for linguistics and then validated with four other disciplines: history, sociology, psychology (analytic and social psychology, and cognitive sciences) and archeology. If the results are good for these five ambiguous disciplines, the indexation of documents which deal with less ambiguous disciplines (like biology, genetics, physics and so on) will be easier with our methodology.

Data Management Plans
Powered by OpenAIRE graph
Found an issue? Give us feedback

Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.

All Research products
arrow_drop_down
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://beta.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::851bf1de86f585438a78e0adb9358b34&type=result"></script>');
-->
</script>
For further information contact us at helpdesk@openaire.eu

No option selected
arrow_drop_down