Jan Pawel Stanczuk*,1, Georgios Batzolis*,1, Teo Deveney2, Carola-Bibiane Schönlieb1
1University of Cambridge, 2University of Bath
*equal contribution
International Conference on Machine Learning (ICML), 2024
In this work, we provide a mathematical proof that diffusion models encode data manifolds by approximating their normal bundles. Based on this observation we propose a novel method for extracting the intrinsic dimension of the data manifold from a trained diffusion model. Our insights are based on the fact that a diffusion model approximates the score function i.e., the gradient of the log density of a noise-corrupted version of the target distribution for varying levels of corruption. We prove that as the level of corruption decreases, the score function points towards the manifold, as this direction becomes the direction of maximal likelihood increase. Therefore, at low noise levels, the diffusion model provides us with an approximation of the manifold's normal bundle, allowing for an estimation of the manifold's intrinsic dimension. To the best of our knowledge, our method is the first estimator of intrinsic dimension based on diffusion models and it outperforms well established estimators in controlled experiments on both Euclidean and image data.
If you find the code useful for your research, please consider citing:
@article{stanczuk2022your,
title={Your diffusion model secretly knows the dimension of the data manifold},
author={Stanczuk, Jan and Batzolis, Georgios and Deveney, Teo and Sch{\"o}nlieb, Carola-Bibiane},
journal={arXiv preprint arXiv:2212.12611},
year={2022}
}