Does Data Scaling Lead to Visual Compositional Generalization?

Abstract

Compositional generalization is crucial for human intelligence, yet it remains unclear whether scaling data and model size can solve this challenge, particularly for vision models. We study how data scaling affects compositional generalization in a simplified visual setting. Our key finding is that while models can achieve compositional generalization, this ability critically depends on data diversity. Models develop compositional structure in their latent space only when trained with diverse data, otherwise failing to learn compositional representations despite achieving discrimination. We show that high data diversity leads to linear concept representations, which we demonstrate enables efficient compositional learning. Analyzing large-scale pretrained models through this framework reveals mixed results, suggesting compositional generalization remains challenging.

Publication
International Conference on Machine Learning
Date
Links