Cookie Hinweis

Wir verwenden Cookies, um Ihnen ein optimales Webseiten-Erlebnis zu bieten. Dazu zählen Cookies, die für den Betrieb der Seite notwendig sind, sowie solche, die lediglich zu anonymen Statistikzwecken, für Komforteinstellungen oder zur Anzeige personalisierter Inhalte genutzt werden. Sie können selbst entscheiden, welche Kategorien Sie zulassen möchten. Bitte beachten Sie, dass auf Basis Ihrer Einstellungen womöglich nicht mehr alle Funktionalitäten der Seite zur Verfügung stehen. Weitere Informationen finden Sie in unseren Datenschutzhinweisen .

Essentiell

Diese Cookies sind für die Funktionalität unserer Website erforderlich und können nicht deaktiviert werden.

Name Webedition CMS
Zweck Dieses Cookie wird vom CMS (Content Management System) Webedition für die unverwechselbare Identifizierung eines Anwenders gesetzt. Es bietet dem Anwender bessere Bedienerführung, z.B. Speicherung von Sucheinstellungen oder Formulardaten. Typischerweise wird dieses Cookie beim Schließen des Browsers gelöscht.
Externe Medien

Inhalte von externen Medienplattformen werden standardmäßig blockiert. Wenn Cookies von externen Medien akzeptiert werden, bedarf der Zugriff auf diese Inhalte keiner manuellen Zustimmung mehr.

Name YouTube
Zweck Zeige YouTube Inhalte
Name Twitter
Zweck Twitter Feeds aktivieren
Data Science Seminar

Statistical recovery of compositional discrete structures

Watch the video recording here

Many data problems, in particular in biogenetics, often come with a highly complex underlying structure. This often makes it difficult to extract interpretable information. In this talk we want to demonstrate that often these complex structures are well approximated by a composition of a few simple parts, which provides very descriptive insights into the underlying data generating process. We demonstrate this with two examples.

In the first example, the single components are finite alphabet vectors (e.g., binary components), which encode some discrete information. For instance, in genetics a binary vector of length n can encode whether or not a mutation (e.g., a SNP) is present at location i = 1,...,n in the genome. On the population level studying genetic variations is often highly complex, as various groups of mutations are present simultaneously. However, in many settings a population might be well approximated by a composition of a few dominant groups, for example, in heterogeneous cancer tumors with a few dominant clones. We demonstrate under which conditions the individual components can be recovered from data and provide computationally efficient algorithms which yield minimax optimal estimation rates.

In the second example, the single components correspond to Boolean interaction terms. An example from genetics is so called epistasis, where several genes are associated in a non-linear way with some trait or phenotype of interest. In this context we consider the Random Forest (RF) algorithm. We demonstrate how the individual interaction components can be recovered consistently from their joint prevalence in a RF tree ensemble.

Biosketch Merle Behr

Merle Behr obtained her PhD in Mathematical Statistics in 2018 from the Georg-August-Universität Göttingen, Germany, under the supervision of Professor Axel Munk. During her PhD she studied Multiscale Change Point Methods and Finite Alphabet Blind Source Separation. Her PhD thesis was awarded with the Dissertationspreis Universität Göttingen, endowed with 10,000 Euro prize money. From 2018 to 2020 she was appointed as a Neyman Visiting Assistant Professor and DFG research fellow at the Statistics Department of the University of California Berkeley, USA. Since December 2020 she works as a scientific Expert at the Research and Development Devision of Bayer AG Pharmaceuticals. Her major research interests are concerned with statistical methods for discrete data, decision tree based methods, blind source separation, and segmentation problems, with applications in genetics, medicine, and natural science, more generally.

to top
powered by webEdition CMS