Title | Structural Intrinsic Dimensionality |
Author | Lorenzo BINI |
Director of thesis | Stephane Marchand-Maillet |
Co-director of thesis | |
Summary of thesis | Information is generally organized by similarity, itself often derived from a distance function. It is via neighborhoods that the data makes sense, as opposed to being a set of arbitrary unrelated items. Hence, studying neighborhoods in metric spaces is a fundamental operation, driving nearly any data processing or analysis, and that has applications in a large variety of domains. A common issue that data modeling and analysis techniques face is the degradation of their performance as the dimensionality of the data at hand increases. With the increase of dimension, the data shows adverse properties that degrade our ability to model neighborhoods and impede the effectiveness of any processing. This is often referred to under the generic label of ``curse of dimensionality'', which bears many facets and finds several definitions and origins in the literature. In this project, we wish to associate to every data point a value (named Structural Intrinsic Dimensionality) related to some true local dimensionality value at the location and to use this value to inform any further processing in order to circumvent this curse.Neighboring relationships naturally relate to graphs constructed over the data. This project therefore proposes to install graph spanners as maps for driving the modeling and analysis of high-dimensional data. We wish to achieve this objective by defining the Structural Intrinsic Dimensionality as a inherent property that emerges from such structures over the data. We organize this investigation into two complementary and parallel objectives that directly shape the project. Objective 1 targets the definition of graph structures supporting the computation of Structural Intrinsic Dimensionality by operations such as diffusion and Objective 2 uses these structures, which encode local dimensionality, as basis for developing operational tools for data exploitation.The overarching research question that we want to answer in this project therefore derives from the generic domain of high-dimensional data modeling and can be formulated as: ``How to construct and use graphs as maps for high-dimensional data spaces?'' The project also addresses the wider question of the power of graphs to retain geometric information, which is in direct relation with that of optimal graph embedding and manifold learning. Our research plan is organized into the work of two PhD researchers collaborating with one Post-Doctoral researcher. Each researcher has a well-defined objective and the collaboration of these lines of research form a coherent direction towards Structural Intrinsic Dimensionality.To anchor our project in a real situation, we wish to develop a specific case study around Flow Cytometry data analysis. We believe that this use case offers a rich context for modeling, and that all operations we target such as density estimation, visualization, or classification are relevant there. More importantly, our existing collaboration with hematologists at Geneva University Hospital offers us a unique opportunity to address this challenge. |
Status | |
Administrative delay for the defence | |
URL | |