QT-DoG

QUANTIZATION-AWARE TRAINING FOR DOMAIN GENERALIZATION

QT-DoG uses weight quantization to achieve flatter minima, improving generalization while reducing model size and computational costs.

Comparison of Existing Methods

Abstract

Domain Generalization (DG) aims to train models that perform well not only on the training (source) domains but also on novel, unseen target data distributions. A key challenge in DG is preventing overfitting to source domains, which can be mitigated by finding flatter minima in the loss landscape. In this work, we propose Quantization-aware Training for Domain Generalization (QT-DoG) and demonstrate that weight quantization effectively leads to flatter minima in the loss landscape, thereby enhancing domain generalization. Unlike traditional quantization methods focused on model compression, QT-DoG exploits quantization as an implicit regularizer by inducing noise in model weights, guiding the optimization process toward flatter minima that are less sensitive to perturbations and overfitting. We provide both theoretical insights and empirical evidence demonstrating that quantization inherently encourages flatter minima, leading to better generalization across domains. Moreover, with the benefit of reducing the model size through quantization, we demonstrate that an ensemble of multiple quantized models further yields superior accuracy than the state-of-the-art DG approaches with no computational or memory overheads. Our extensive experiments demonstrate that QT-DoG generalizes across various datasets, architectures, and quantization algorithms, and can be combined with other DG methods, establishing its versatility and robustness.

Comparison of Best Performing Domain Generalization Methods
Algorithm Models Size Avg
ERM 1 1x 63.8
SWAD 1 1x 66.9
MIRO 1 1x 65.9
CCFP 1 1x 64.8
CORAL 1 1x 64.1
QT-DoG (ours) 1 0.22x 66.2
EoA 6 6x 68.0
DiWA 60 1x 68.0
ERM Ens. 6 6x 66.8
EoQ (ours) 5 1.1x 68.4

Visualizations

The figure below showcases GradCAM results from the PACS dataset, illustrating four experiments where each run targets a different domain while using others for training. These visualizations evidence that quantization focuses on better regions than ERM, and with a much larger receptive field. In certain cases, ERM does not even focus on the correct image region. It is quite evident that quantization pushes the model to learn more generalized patterns, leading to a model that is less sensitive to the specific details of the training set. These qualitative results confirm the quantitative evidence provided in the Table above.

GRADCAM Visualizations

Loss Flatness Analysis

We demonstrates that incorporating quantization in the Empirical Risk Minimization (ERM) process leads to finding flatter minima. Our analysis indicates that QT-DoG achieves flatter minima compared to both ERM and existing methods such as SAM and SWA.

Following the approach of previous work, we quantify local flatness \( \mathcal{F}_\gamma (\mathbf{w}) \) by measuring the expected change in loss values between a model with parameters \( \mathbf{w} \) and a perturbed model \( \mathbf{w}' \) on a sphere of radius \( \gamma \) centered at \( \mathbf{w} \):

\[ \mathcal{F}_\gamma (\mathbf{w})=\mathbb{E}_{\|\mathbf{w}'\|} \left[\mathcal{E}(\mathbf{w}') - \mathcal{E}(\mathbf{w})\right], \]

where \( \mathcal{E}(\mathbf{w}) \) represents the accumulated loss over samples from either source or target domains, depending on the evaluation context.

Loss Flatness

Quantizing Vision Transformers

Comparison of performance on PACS and TerraInc datasets with and without QT-DoG quantization of ERM_ViT using the DeiT-Small backbone.
Algorithm Backbone PACS TerraInc Compression
ERM_ViT DeiT-Small 84.3 ± 0.2 43.2 ± 0.2 -
ERM-SD_ViT DeiT-Small 86.3 ± 0.2 44.3 ± 0.2 -
ERM_ViT + QT-DoG DeiT-Small 86.2 ± 0.3 45.6 ± 0.4 4.6x

Combination with Other Methods

Results of PACS and Terra Incognita datasets incorporating QT-DoG with CORAL and MixStyle. "Compression" represents the compression factor of the model.
Algorithm PACS TerraInc Compression
CORAL 85.5 ± 0.6 47.1 ± 0.2 -
CORAL + QT-DoG 86.9 ± 0.2 50.6 ± 0.3 4.6x
MixStyle 85.2 ± 0.3 44.0 ± 0.4 -
MixStyle + QT-DoG 86.8 ± 0.3 47.7 ± 0.2 4.6x

Acknowledgments

We thank Soumava Roy for assisting with the GradCAM visualizations across different datasets. We also thank Chen Zhao for his helpful comments and feedback. Moreover, we are grateful to the creators of the SWAD and LSQ GitHub repositories, as our code was built upon their work.

BibTeX

@misc{javed2024qtdogquantizationawaretrainingdomain,
      title={QT-DoG: Quantization-aware Training for Domain Generalization},
      author={Saqib Javed and Hieu Le and Mathieu Salzmann},
      year={2024},
      archivePrefix={arXiv},
      url={https://arxiv.org/abs/2410.06020},
}