Quantization Sequence Matters
A heuristic search of quantization flow sequences to demonstrate quantization flow optimality for K=3 module WDR.
Listen to an audio summary of this paper, generated by Notebook LM:
Edge applications, such as collaborative robotics and spacecraft rendezvous, demand efficient 6D object pose estimation on resource-constrained embedded platforms. Existing 6D object pose estimation networks are often too large for such deployments, necessitating compression while maintaining reliable performance.
To address this challenge, we introduce Modular Quantization-Aware Training (MQAT), an adaptive and mixed-precision quantization-aware training strategy that exploits the modular structure of modern 6D object pose estimation architectures. MQAT guides a systematic gradated modular quantization sequence and determines module-specific bit precisions, leading to quantized models that outperform those produced by state-of-the-art uniform and mixed-precision quantization techniques.
Our experiments showcase the generality of MQAT across datasets, architectures, and quantization algorithms. Additionally, we observe that MQAT quantized models can achieve an accuracy boost (> 7% ADI-0.1d) over the baseline full-precision network while reducing model size by a factor of 4x or more.
Modular Quantization-Aware Training (MQAT): MQAT leverages the modular structure of 6D object pose estimation networks to apply mixed-precision quantization, optimizing performance while minimizing computational costs.
Enhanced Edge Efficiency: Specifically designed for resource-constrained edge devices, MQAT supports applications like collaborative robotics and spacecraft rendezvous, providing efficient and accurate 6D object pose estimation.
Quantization Flow Control: MQAT introduces a unique quantization flow control that optimally sequences the quantization of different modules (e.g., Backbone, Rotation Head, and PnP-Patch), ensuring minimal performance loss during model compression.
Superior Compression with High Accuracy: Compared to traditional quantization techniques like LSQ and HAWQ-V3, MQAT achieves higher accuracy (up to 7% boost in ADI-0.1d) with greater compression, reducing model sizes by up to 4x or more.
Broad Applicability Across Datasets and Architectures: MQAT demonstrates versatility by achieving improvements in both 6D pose estimation and object detection tasks, validated across multiple datasets.
A heuristic search of quantization flow sequences to demonstrate quantization flow optimality for K=3 module WDR.
Comparison between our proposed MQAT, uniform QAT (LSQ), and layer-wise mixed-precision QAT (HAWQ-V3).
MQAT demonstrates its adaptability and effectiveness across various single-stage and multi-stage 6D object pose estimation architectures.
In Table 1, we compare several single-stage PnP architectures on the SwissCube dataset. To demonstrate the generality of our performance enhancement, we apply MQAT to aggressively quantize the FPN of both CA-SpaceNet and WDR. We demonstrate an accuracy improvement of 4.5%, 4.4% and 4.5% for Near, Medium and Far images, respectively, on CA-SpaceNet, resulting in a total testing set accuracy improvement of 3.3%. Recall the already presented total testing set accuracy improve- ment of 5.0% for WDR. Previously, the full precision CA-SpaceNet had shown a performance improvement over the full precision WDR, but WDR sees greater gains from the application of MQAT.
Network | Near | Medium | Far | All |
---|---|---|---|---|
SegDriven-Z | 41.1 | 22.9 | 7.1 | 21.8 |
DLR | 52.6 | 45.4 | 29.4 | 43.2 |
CA-SpaceNet | 91.0 | 86.3 | 61.7 | 79.4 |
CA-SpaceNet* | 95.5 | 90.7 | 66.2 | 82.7 |
WDR | 92.4 | 84.2 | 61.3 | 78.8 |
WDR* | 96.1 | 91.5 | 68.2 | 83.8 |
Table 1: Comparison with the state-of-the-art on SwissCube. We report ADI-0.1d scores for three different depth ranges. A '*' indicates applying MQAT with 2-bit precision FPN to the model.
Figure 1 illustrates the WDR network with MQAT applied in an 8-2-8 quantization scheme. For comparison, the full-precision model (without quantization) is also shown.
Figure 1: Comparison between our proposed MQAT paradigm, and full- precision network. The model applying MQAT yields predictions that are on par with, or more concentrated than its full-precision counterpart.
In addition, published accuracy results for a uniform QAT quantized CA-Space network, shared in Table 2. Specifically, CA-SpaceNet explored three quantization modes (B, BF and BFH). These correspond to quantizing the backbone, quantizing the backbone and FPN (paired), and quantizing the whole network (uniformly), respectively.
Quantization Method | ADI-0.1d | Compression | Bit-Precisions (B-F-H) |
---|---|---|---|
LSQ | 79.4 | 1x | 32-32-32 |
LSQ B | 76.2 | 2.2x | 8-32-32 |
LSQ BF | 75.0 | 3.2x | 8-8-32 |
LSQ BFH | 74.7 | 4.0x | 8-8-8 |
MQAT (Ours) | 82.7 | 4.7x | 8-2-8 |
MQAT (Ours) | 80.2 | 8.2x | 4-2-4 |
LSQ BFH | 68.7 | 10.6x | 3-3-3 |
Table 2: CA-SpaceNet Published Quantization vs MQAT. We report ADI scores on the SwissCube dataset sorted by the compression factor of the network, for MQAT methods, we use quantization flow F→H→B.
In Table 3, we demonstrate the performance of our method on the state-of-the-art 6D object pose estimation network, the two-stage ZebraPose network.
Quantization Method | ADD 0.1d | Compression | Bit-Precisions | Quantization Flow |
---|---|---|---|---|
Full Precision | 76.90 | 1× | Full precision | N/A |
HAWQ-V3 | 71.11 | 4× | Mixed (layer-wise) | N/A |
HAWQ-V3 | 69.87 | 4.60× | Mixed (layer-wise) | N/A |
MQAT | 72.54 | 4.62× | 8-4 (B-D) | D → B |
Table 3: Quantization of ZebraPose. We report ADD scores on the LM-O dataset and compare MQAT to mix-precision quantization (HAWQ-V3).
In Table 4, we evaluated GDR-Net using existing uniform and mixed-precision quantization methods alongside our MQAT approach. The ADD-0.1d metric was used for 6D object pose estimation on the O-Linemod dataset, following the methodology of Wang et al. MQAT outperforms both LSQ and HAWQ-V3, achieving superior results even with a slightly more compressed network.
Quantization Method | ADD 0.1d | Compression | Bit-Precisions | Quantization Flow |
---|---|---|---|---|
Full Precision | 56.1 | 1× | Full precision | N/A |
LSQ | 50.7 | 4.57× | Uniform (7-bit) | N/A |
HAWQ-V3 | 50.3 | 4.9× | Mixed (layer-wise) | N/A |
MQAT | 51.8 | 4.97× | 8-4-4 (B-R-P) | R → P → B |
Table 4: Quantization of GDR-Net. We report ADD scores on the LM-O dataset and compare MQAT to uniform (LSQ) and mix-precision quantization (HAWQ-V3). B, R and P indicates Backbone, Rotation Head and PnP-Patch modules.
In our study, while the primary focus is on 6D pose estimation, where our method's efficacy is already demonstrated, we further extend our evaluation to object detection tasks. This extension is aimed at underscoring the generality of our approach.
To this end, we applied our quantization technique to the Faster R-CNN network, which utilizes a ResNet- 50 backbone, a widely recognized model in object detection tasks. Our evaluation was conducted on the comprehensive COCO dataset, a benchmark for object detection.
Network | QAT Method | mAP | Compression |
---|---|---|---|
FasterRCNN | Full Precision | 37.9 | 1x |
FQN | 32.4 | 8x | |
INQ | 33.4 | 8x | |
LSQ | 33.8 | 8x | |
MQAT (ours) | 35.1 | 8x | |
EfficientDet-D0 | Full Precision | 33.16 | 1x |
N2UQ | 20.11 | 10x | |
MQAT (ours) | 21.67 | 10x | |
Table 3: Quantization for Object Detection. We evaluate the given networks on COCO dataset and report mAP.
Figure 2: Visualization of the Difference in Object Detection Performanceon MSCOCO between N2UQ and MQAT at the same compression ratio.
We thank Ziqi Zhao for assisting with the experiments for Multi-stage architecture. This project has received
funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 945363. Moreover, this work was funded in part by the Swiss
National Science Foundation and the Swiss Innovation Agency (Innosuisse) via the BRIDGE Discovery
grant No. 194729.
Acknowledgements
BibTeX
@article{
javed2024modular,
title={Modular Quantization-Aware Training for 6D Object Pose Estimation},
author={Saqib Javed and Chengkun Li and Andrew Lawrence Price and Yinlin Hu and Mathieu Salzmann},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2024},
url={https://openreview.net/forum?id=lIy0TEUou7},
note={}
}