Found an issue? Give us feedback

AGDAM

Collaborative Research: CISE-ANR: CIF: Small: Learning from large datasets: Application to multi-subject fMRI analysis

assignment_turned_inprojectFrom01 May 2024

Funder: French National Research Agency (ANR)Project code: ANR-23-CE94-0001

Funder Contribution: 328,154 EUR

AGDAM

- Summary
- DMPs

Description

Nowadays there is an increasing availability of multiple and complementary datasets associated with a given problem, and the main challenge is the extraction of features that are most useful and relevant for the given task. This is generally achieved considering source mixing models where the components (sources) are associated with quantities of interest. Since usually, very little is known about the actual interaction among the datasets, it is highly desirable to minimize the underlying assumptions when estimating the sources. This has been the main reason for the growing importance of joint matrix and tensor decompositions, as they not only enable full interaction among the datasets but also yield factor matrices that are directly interpretable. An effective way to capture the inherent relationships among the samples of multiple datasets is to make use of appropriate statistical models. Independent vector analysis (IVA) enables such a powerful formulation with general uniqueness guarantees for matrix decompositions. Another effective approach, primary based on algebraic arguments, uses tensors to take the multi-way structure of the data into account. Tensors enable unique identifiability by naturally constraining the mixing model. A crucial aspect when dealing with multiple data is the large number of datasets, which can easily reach 10s of thousands or more. When the number of datasets grows, an important challenge is how to best summarize the information while making sure that the features that relate to individual variability within each dataset are preserved. Identification of homogeneous subspaces, where the components within a subspace are highly related (correlated/dependent) is an effective way to summarize the heterogeneity in large datasets. This is the argument behind low-rank models, but, with a large number of datasets, such subspaces should be defined across subsets rather than all the datasets in the decomposition. Hence, this is an important challenge for tensor methods, which can be readily scalable for large datasets. On the other hand, for the IVA, where this information is directly captured through a multivariate probability density model, scalability becomes a major concern when the number of datasets increases. Hence, there are unique advantages and challenges for each approach, each constituting a different way to represent and work with multiset data. The methodology developed in this proposal targets multiple large spatio-temporal datasets, i.e., datasets acquired across spatial and temporal dimensions. Spatio-temporal data arises in many domains (neuroscience, environmental science, social media, traffic dynamics, etc.) and the aim is to develop a unified and rigorous framework for extracting homogeneous subgroups and features from such data. In a first stage we develop a set of powerful methods for extracting features through identification of homogeneous subgroups in large datasets, with two powerful approaches for spatio-temporal data: (i) a statistically motivated matrix decomposition framework based on IVA, and (ii) coupled tensor decompositions with shared and dataset-specific components. Then, in a second stage we establish the connections between these two approaches, both in terms of methods and uniqueness conditions, and develop a methodology for subgroup identification. Finally, we will apply the developed methodology to fMRI data, and more specifically, to the Adolescent Brain Cognitive Development (ABCD) Study, a comprehensive longitudinal data from a national and diverse cohort of almost 12K children ages 9 - 10 followed throughout adolescence.

Partners

CHU , UL , CRAN , INS2I , University of Maryland, Baltimore County , ICL , CNRS

Data Management Plans

Start a new DMP in Argos

Found an issue? Give us feedback

Select content type to embed

All Research products

arrow_drop_down

<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://beta.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::5ece50833a361f2c8f04e985a7a1c2af&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

AGDAM

AGDAM

Loading