ID-diff: Diffusion Models Encode the Intrinsic Dimension of Data Manifolds

Jan Pawel Stanczuk*,1, Georgios Batzolis*,1, Teo Deveney2, Carola-Bibiane Schönlieb1

1University of Cambridge, 2University of Bath

*equal contribution

International Conference on Machine Learning (ICML), 2024

TL;DR

Teaser Image

Abstract

In this work, we provide a mathematical proof that diffusion models encode data manifolds by approximating their normal bundles. Based on this observation we propose a novel method for extracting the intrinsic dimension of the data manifold from a trained diffusion model. Our insights are based on the fact that a diffusion model approximates the score function i.e., the gradient of the log density of a noise-corrupted version of the target distribution for varying levels of corruption. We prove that as the level of corruption decreases, the score function points towards the manifold, as this direction becomes the direction of maximal likelihood increase. Therefore, at low noise levels, the diffusion model provides us with an approximation of the manifold's normal bundle, allowing for an estimation of the manifold's intrinsic dimension. To the best of our knowledge, our method is the first estimator of intrinsic dimension based on diffusion models and it outperforms well established estimators in controlled experiments on both Euclidean and image data.

Main Theoretical Results

Theory Image

Experimental Results

Results Image

References

If you find the code useful for your research, please consider citing:

@article{stanczuk2022your,
  title={Your diffusion model secretly knows the dimension of the data manifold},
  author={Stanczuk, Jan and Batzolis, Georgios and Deveney, Teo and Sch{\"o}nlieb, Carola-Bibiane},
  journal={arXiv preprint arXiv:2212.12611},
  year={2022}
}