Powered by OpenAIRE graph
Found an issue? Give us feedback

INIST

Institut de l'Information Scientifique et Technique
3 Projects, page 1 of 1
  • Funder: French National Research Agency (ANR) Project Code: ANR-12-CORD-0029
    Funder Contribution: 710,718 EUR

    The collaborative research project TermITH (Terminology and Indexation of Texts in the area of Humanities) merges six French partners : ATILF (Analysis and Natural Language Processing of French Language), INIST (National Institute of Scientific and Technical Information), LINA (Laboratory of Computer Science from Nantes), LIDILEM (Laboratory of Linguistics and Applied Linguistics of native and second languages from Grenoble) and two INRIA Centers (National Institute of research in Computer Science and Automatics), INRIA Nancy Grand-Est and INRIA Saclay. This project deals with information access to textual documents via a full-text indexing which is based on terms which are detected, disambiguated and analyzed. This issue is well-known: the digital age is characterized by a very large quantity of information that has to be indexed to allow access to it, by the growing diversity of the areas and disciplines which entails a more and more frequent interdisciplinary. Text indexing based on terms occurring still is a hot research topic though different approaches have recently provided some good results. These approaches use either occurrences of terms which are detected on the basis of their textual form (projection of controlled vocabularies or structured terminologies using pattern matching, inflection rules, syntagmatic variations like for instance FASTR), or term candidates which result from some automatic terms detection components. All these methodologies require expensive human verification: (1) for indexing: manual checking of the automatically defined indexes or even, complete analysis of documents in order to define the good indexes of these documents, (2) for the automatic terms detection: classification of the very large amount of terms candidates, (3) for the projection of controlled vocabularies or structured terminology: updating of the terminological resources. TermITH’s approach is intended to cross the automatically detected and disambiguated occurrences of terms in texts with available interdisciplinary lexicons and terminological resources to isolate the specific terms for each studied area. Such an approach has two main advantages. First, it limits the human cost for the manual evaluation of indexes of documents and the manual analysis of documents if necessary. This results from the disambiguation and the crossing with interdisciplinary lexicons and terminological resources. Second, it will permit an automatic updating of terminological resources. From the theoretical point of view, TermITH will allow cross-fertilization of disciplines which grow in parallel for the moment: contextual disambiguation, data mining and textual statistics for terms disambiguation; automatic terms detection, terminological resources projection and interdisciplinary lexicons for terms detection and index of them in texts. In the first experimental phase, TermITH actors have chosen to work within a scientific area in which the ambiguity between terminological and general language usage is very high: the humanities. The projected methodology will be tested for linguistics and then validated with four other disciplines: history, sociology, psychology (analytic and social psychology, and cognitive sciences) and archeology. If the results are good for these five ambiguous disciplines, the indexation of documents which deal with less ambiguous disciplines (like biology, genetics, physics and so on) will be easier with our methodology.

    more_vert
  • Funder: French National Research Agency (ANR) Project Code: ANR-22-CE23-0033
    Funder Contribution: 782,531 EUR

    The MaTOS (Machine Translation for Open Science) project aims to develop new methods for the machine translation (MT) of complete scientific documents, as well as automatic metrics to evaluate the quality of these translations. Our main application target is the translation of scientific articles between French and English, where linguistic resources can be exploited to obtain more reliable translations, both for publication purposes and for gisting and text mining. However, efforts to improve MT of complete documents are hampered by the inability of existing automatic metrics to detect weaknesses in the systems and to identify the best ways to remedy them. The MaTOS project aims to address both of these issues. This project is part of a movement to automate the processing of scientific articles; MT is no exception to this trend, particularly in the biomedical field. Applications are numerous: text mining, bibliometric analysis, automatic detection of plagiarism and articles reporting falsified conclusions, etc. We wish to take advantage of the results of these works, but also to contribute to it in many ways: (a) by developing new open resources for specialised MT; (b) by improving, through the study of terminological variations, the description of textual coherence markers for scientific articles; (c) by studying new methods of multilingual processing for these documents; (d) by proposing metrics dedicated to the measurement of progress for this type of task. The final result will allow, through improved translation, the circulation and dissemination of scientific knowledge.

    more_vert
  • Funder: French National Research Agency (ANR) Project Code: ANR-19-DATA-0017
    Funder Contribution: 97,200 EUR

    The French Bioinformatics Institute (IFB), with its two computing infrastructures and its 30 member core facilities, is an essential structure for Life Sciences, providing a production, analysis and management environment for the biology and medical biology communities. Although those communities may be considered well-provided with data structuration and management tools, issues still remain in order to fully implement best practices and to manage data from its production to its preservation, ensuring its accessibility, reproducibility and reusability. Data Management Plan is considered as a key element to facilitate the implementation of FAIR principles. DMP contains decisions that drive the management process throughout the data lifecycle and thus provide traces about the provenance of data (including all research outputs derived from the raw data). Inist-CNRS provides the tool DMP OPIDoR (https://dmp.opidor.fr/) that facilitates the drafting of DMPs and also contributes to the harmonization of best practices by providing guidance, examples and institution-specific templates. Based on user feedback, collected cases and RDA active DMP work, the evolution of DMP OPIDoR toward machine-actionable DMP is being currently studied. The machine-actionable DMP OPIDoR tool will be compliant to the RDA DMP common model but also include extensions so as to serve the different actors and disciplinary needs as required and validated by a large user group. Through the partnership Inist-IFB, a proof of concept (POC) will be conducted to test the ability of the machine-actionable DMP OPIDoR to meet IFB needs by improving data management and metadata quality. Alongside the technical development, IFB will also elaborate training tools for its members and its user communities to provide them with skills in data management and stewardship.

    more_vert
1 Organizations, page 1 of 1

Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.

Content report
No reports available
Funder report
No option selected
arrow_drop_down

Do you wish to download a CSV file? Note that this process may take a while.

There was an error in csv downloading. Please try again later.