Quantization Sequence Matters
A heuristic search of quantization flow sequences to demonstrate quantization flow optimality for K=3 module WDR.
Listen to an audio summary of this paper, generated by Notebook LM:
Edge applications, such as collaborative robotics and spacecraft rendezvous, demand efficient 6D object pose estimation on resource-constrained embedded platforms. Existing 6D object pose estimation networks are often too large for such deployments, necessitating compression while maintaining reliable performance.
To address this challenge, we introduce Modular Quantization-Aware Training (MQAT), an adaptive and mixed-precision quantization-aware training strategy that exploits the modular structure of modern 6D object pose estimation architectures. MQAT guides a systematic gradated modular quantization sequence and determines module-specific bit precisions, leading to quantized models that outperform those produced by state-of-the-art uniform and mixed-precision quantization techniques.
Our experiments showcase the generality of MQAT across datasets, architectures, and quantization algorithms. Additionally, we observe that MQAT quantized models can achieve an accuracy boost (> 7% ADI-0.1d) over the baseline full-precision network while reducing model size by a factor of 4x or more.
Modular Quantization-Aware Training (MQAT): MQAT leverages the modular structure of 6D object pose estimation networks to apply mixed-precision quantization, optimizing performance while minimizing computational costs.
Enhanced Edge Efficiency: Specifically designed for resource-constrained edge devices, MQAT supports applications like collaborative robotics and spacecraft rendezvous, providing efficient and accurate 6D object pose estimation.
Quantization Flow Control: MQAT introduces a unique quantization flow control that optimally sequences the quantization of different modules (e.g., Backbone, Rotation Head, and PnP-Patch), ensuring minimal performance loss during model compression.
Superior Compression with High Accuracy: Compared to traditional quantization techniques like LSQ and HAWQ-V3, MQAT achieves higher accuracy (up to 7% boost in ADI-0.1d) with greater compression, reducing model sizes by up to 4x or more.
Broad Applicability Across Datasets and Architectures: MQAT demonstrates versatility by achieving improvements in both 6D pose estimation and object detection tasks, validated across multiple datasets.
A heuristic search of quantization flow sequences to demonstrate quantization flow optimality for K=3 module WDR.
Comparison between our proposed MQAT, uniform QAT (LSQ), and layer-wise mixed-precision QAT (HAWQ-V3).
MQAT demonstrates its adaptability and effectiveness across various single-stage and multi-stage 6D object pose estimation architectures.
In Table 1, we compare several single-stage PnP architectures on the SwissCube dataset. To demonstrate the generality of our performance enhancement, we apply MQAT to aggressively quantize the FPN of both CA-SpaceNet and WDR. We demonstrate an accuracy improvement of 4.5%, 4.4% and 4.5% for Near, Medium and Far images, respectively, on CA-SpaceNet, resulting in a total testing set accuracy improvement of 3.3%. Recall the already presented total testing set accuracy improve- ment of 5.0% for WDR. Previously, the full precision CA-SpaceNet had shown a performance improvement over the full precision WDR, but WDR sees greater gains from the application of MQAT.
Network | Near | Medium | Far | All |
---|---|---|---|---|
SegDriven-Z | 41.1 | 22.9 | 7.1 | 21.8 |
DLR | 52.6 | 45.4 | 29.4 | 43.2 |
CA-SpaceNet | 91.0 | 86.3 | 61.7 | 79.4 |
CA-SpaceNet* | 95.5 | 90.7 | 66.2 | 82.7 |
WDR | 92.4 | 84.2 | 61.3 | 78.8 |
WDR* | 96.1 | 91.5 | 68.2 | 83.8 |
Figure 1 illustrates the WDR network with MQAT applied in an 8-2-8 quantization scheme. For comparison, the full-precision model (without quantization) is also shown.
In addition, published accuracy results for a uniform QAT quantized CA-Space network, shared in Table 2. Specifically, CA-SpaceNet explored three quantization modes (B, BF and BFH). These correspond to quantizing the backbone, quantizing the backbone and FPN (paired), and quantizing the whole network (uniformly), respectively.
Quantization Method | ADI-0.1d | Compression | Bit-Precisions (B-F-H) |
---|---|---|---|
LSQ | 79.4 | 1x | 32-32-32 |
LSQ B | 76.2 | 2.2x | 8-32-32 |
LSQ BF | 75.0 | 3.2x | 8-8-32 |
LSQ BFH | 74.7 | 4.0x | 8-8-8 |
MQAT (Ours) | 82.7 | 4.7x | 8-2-8 |
MQAT (Ours) | 80.2 | 8.2x | 4-2-4 |
LSQ BFH | 68.7 | 10.6x | 3-3-3 |
In Table 3, we demonstrate the performance of our method on the state-of-the-art 6D object pose estimation network, the two-stage ZebraPose network.
Quantization Method | ADD 0.1d | Compression | Bit-Precisions | Quantization Flow |
---|---|---|---|---|
Full Precision | 76.90 | 1× | Full precision | N/A |
HAWQ-V3 | 71.11 | 4× | Mixed (layer-wise) | N/A |
HAWQ-V3 | 69.87 | 4.60× | Mixed (layer-wise) | N/A |
MQAT | 72.54 | 4.62× | 8-4 (B-D) | D → B |
In Table 4, we evaluated GDR-Net using existing uniform and mixed-precision quantization methods alongside our MQAT approach. The ADD-0.1d metric was used for 6D object pose estimation on the O-Linemod dataset, following the methodology of Wang et al. MQAT outperforms both LSQ and HAWQ-V3, achieving superior results even with a slightly more compressed network.
Quantization Method | ADD 0.1d | Compression | Bit-Precisions | Quantization Flow |
---|---|---|---|---|
Full Precision | 56.1 | 1× | Full precision | N/A |
LSQ | 50.7 | 4.57× | Uniform (7-bit) | N/A |
HAWQ-V3 | 50.3 | 4.9× | Mixed (layer-wise) | N/A |
MQAT | 51.8 | 4.97× | 8-4-4 (B-R-P) | R → P → B |
In our study, while the primary focus is on 6D pose estimation, where our method's efficacy is already demonstrated, we further extend our evaluation to object detection tasks. This extension is aimed at underscoring the generality of our approach.
To this end, we applied our quantization technique to the Faster R-CNN network, which utilizes a ResNet- 50 backbone, a widely recognized model in object detection tasks. Our evaluation was conducted on the comprehensive COCO dataset, a benchmark for object detection.
Network | QAT Method | mAP | Compression |
---|---|---|---|
FasterRCNN | Full Precision | 37.9 | 1x |
FQN | 32.4 | 8x | |
INQ | 33.4 | 8x | |
LSQ | 33.8 | 8x | |
MQAT (ours) | 35.1 | 8x | |
EfficientDet-D0 | Full Precision | 33.16 | 1x |
N2UQ | 20.11 | 10x | |
MQAT (ours) | 21.67 | 10x | |