Distance correlation
Measuring dependence, i.e. relationship between random variables or random vectors, undoubtedly plays a central role in statistics. For this task, numerous measures are available, by far the most prominent method is the classical Pearson correlation. A drawback of Pearson correlation (and other classical correlation coefficients) is that it can not detect complex dependency patterns.
Distance correlation is a powerful measure of dependence that has been proposed by Gábor Székely and his coauthors Maria Rizzo and Nail Bakirov. It has the crucial feature that it equals zero if and only if the variables are mutually independent. Hence the distance correlation can detect arbitrary types of non-linear associations.
As an example, we consider the three associations in the following figure.
As we can see, Pearson correlation (denoted by Cor) is very low for all three associations, demonstrating that it is not an appropriate measure for these complex associations. On the other hand distance correlation (denoted by dCor) is substantially greater than zero in all cases. This demonstrates that the distance correlation can be a valuable tool for analyses of big molecular datasets, where visual checks for nonlinearity are hardly feasible.
In our research, we have developed various distance correlation methods for problems in biostatistics. A particular focus of our work was the derivation of distance correlation methods for survival data. We also established distance correlation methods for clustering mixed type (e.g., binary, categorical and continuous) data.
Moreover, we developed the theory for the distance standard deviation, a robust measure of spread based on the concept of distance correlation. Recently, we have unified the theory of distance correlation with the concept of the global tests developed by Jelle Goeman, a very popular approach for testing in molecular data. Our current focus is adapting this approach for developing distance correlation methods for genomic data, for which we collaborate with Fernando Castro-Prado and Wenceslao González Manteiga from the University of Santiago de Compostela.
- Hummel, M., Edelmann, D., & Kopp-Schneider, A. (2017). Clustering of samples and variables with mixed-type data. PloS one, 12(11), e0188274.
- Edelmann, D., Fokianos, K., & Pitsillou, M. (2019). An updated literature review of distance correlation and its applications to time series. International Statistical Review, 87(2), 237-262.
- Edelmann, D., Richards, D., & Vogel, D. (2020). The distance standard deviation. The Annals of Statistics, 48(6), 3395-3416.
- Edelmann, D., Hummel, M., Hielscher, T., Saadati, M., & Benner, A. (2020). Marginal variable screening for survival endpoints. Biometrical Journal, 62(3), 610-626.
- Edelmann, D., Saadati, M., Putter, H., & Goeman, J. (2020). A global test for competing risks survival analysis. Statistical Methods in Medical Research, 29(12), 3666-3683.
- Edelmann, D., Móri, T. F., & Székely, G. J. (2021). On relationships between the Pearson and the distance correlation coefficients. Statistics & Probability Letters, 169, 108960.
- Edelmann, D., Terzer, T., & Richards, D. (2021). A Basic Treatment of the Distance Covariance. Sankhya B, 83(1), 12-25.
- Edelmann, D., Welchowski, T., & Benner, A. (2022). A consistent version of distance covariance for right‐censored survival data and its application in hypothesis testing. Biometrics, 78(3), 867-879.
- Edelmann, D., & Goeman, J. (2022). A Regression Perspective on Generalized Distance Covariance and the Hilbert–Schmidt Independence Criterion. Statistical Science, 37(4), 562-579.