DiffuRefine-ViT

BSc (Hons) Artificial Intelligence and Data Science | Individual Research Project

Artificial Intelligence & Machine Learning

Computer Vision & Image Processing

Benura Wickramanayake

Autonomous driving relies heavily on Traffic Sign Recognition (TSR), but real-world variables like rain streaks, motion blur, poor resolution, and physical vandalism severely impair system performance. While existing restoration solutions—such as Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), and standard Vision Transformers (ViTs)—offer partial fixes, they individually struggle to balance global feature modeling, visual fidelity, and overall computational efficiency. To solve this critical bottleneck, researchers introduce DiffuRefine-ViT, a highly efficient, lightweight hybrid architecture designed specifically for robust traffic sign reconstruction and super-resolution. How It Works – The framework seamlessly embeds ViT-based global feature extraction into a Mean-Reverting Stochastic Differential Equation (MR-SDE) diffusion backbone. It intricately links ConditionalNAFNet and ConditionalUNet score networks—both enhanced with ViT bottleneck encoders and Feature-wise Linear Modulation (FiLM)—within an iterative reverse-SDE pipeline. Key Performance Metrics – Peak Accuracy: Testing on the GTSRB dataset yielded a peak validation PSNR of 25.28 dB (SSIM: 0.761, LPIPS: 0.092), marking an impressive 8.32 dB leap over the standard IR-SDE baseline. Multi-Task Success: The model delivers consistent, high-quality restorations across deraining (25.28 dB), deblurring (26.61 dB), and deshadowing (33.57 dB) scenarios. Resource Efficiency: Remarkably, the system requires only 16.48 million parameters and a compact 62.85 MB footprint, clocking a 1.73-second per-image inference speed on a standard T4 GPU. Ultimately, DiffuRefine-ViT stands as an exceptionally effective, deployment-ready solution for restoring degraded traffic sign imagery in modern autonomous vision pipelines.