Image Processing – Cutting Edge '25

Work Sphere (Elevating Hybrid Employee’s Engagement in Online Working through Multi-model Involvement Recognition with Explainable AI)

“The COVID-19 epidemic changed IT industry processes by hastening the introduction of remote work. In virtual workplaces, detecting employee engagement has become essential. Conventional monitoring ignores behavioral measures like mouse activity and screen duration, as well as subtle engagement signs like head movements and facial expressions. Leadership is unable to comprehend employee difficulties and offer prompt assistance because of this detection gap. Resolving this issue will help organizations develop healthier cultures by empowering leaders to address engagement issues, build relationships, and create supportive settings even when people are physically separated.
In order to solve this issue, we created a multi-model engagement detection system that uses facial expressions, eye gazing, and head posture as crucial indicators. In order to identify two emotion kinds using face ROIs which are crucial for engagement analysis we developed innovative CNN architectures. Class imbalance in the complicated emotion dataset was handled by a parallel model that concentrated on basic emotions, offering a thorough engagement detection method. The most important facial ROIs indicating engagement were found using XAI algorithms, which also confirmed predictions.
Accuracy, precision, recall, and F1-score criteria were used to assess the CNN model in a variety of employee engagement scenarios. The results showed that both basic and complicated emotion prediction models were able to identify patterns of workplace engagement with 61.5% accuracy. Real involvement levels in remote work environments were successfully predicted by combining mouse and screen time monitoring with eye gaze and face angle estimation.
“

Chart Based Stock Market Price Prediction for CSE using Deep Learning Explainability

This project aims to develop a stock market price prediction system for the Colombo Stock Exchange (CSE) using candlestick chart images and deep learning techniques integrated with Explainable AI (XAI). Unlike traditional numerical forecasting models, this research focuses on visual patterns within candlestick charts to capture complex price movement trends. The system utilizes Convolutional Neural Networks (CNNs), specifically EfficientNetB7, to extract meaningful features from candlestick chart images. These extracted features are then fed into a Long Short-Term Memory (LSTM) model to perform sequential time-series forecasting and predict future stock prices, including open, high, low, and close values. Additionally, the system incorporates XAI methods to provide visual explanations for the model’s predictions, enhancing transparency and building investor trust. The ultimate goal is to offer an intelligent and interpretable decision support tool for investors and financial analysts, helping them understand not only the predicted outcomes but also the reasoning behind them. By combining image processing, deep learning, and explainability, this project bridges the gap between predictive accuracy and model interpretability in financial forecasting for the CSE.

Sustainable Vehicle Parking Utilisation to Minimise Traffic Congestion Using Real-Time Computer Vision and Perspective Transformation

The recent decades have experienced rapid growth of cities, their populations and the number of vehicles, which however has caused an upsurge of parking space problems in cities and eventually led to traffic congestion and waste of limited resources of parking spaces. The present research offers a new vision in resolving this problem by combining computer vision and perspective transformation in real time for parking management. The system used side view cameras and geometric transformation in order to view parking spaces from the non-conventional camera angles and even the position of the parking space itself. One of the significant advances is the concept of a comprehensive system of management of dispersed private and public parking lots targeting fragmentation of parking infrastructure within urban centres. The solution applies imaging and diagnostic algorithms for vehicle detection and tracking in real time, where information was adapted to the input for each of the vehicle types including cars, motorcycles, and three wheeled vehicles. The system makes use of perspective transformation and dynamic allocation of spaces to map and manage parking spaces and provides live information about available spaces to end users. The results from the study revealed increased parking space utilisation levels and decreased parking space search times which, in turn reduced the congestion levels of traffic. The research addresses critical gaps in existing parking management systems, particularly in handling multiple vehicle types and integrating diverse parking spaces within the urban environment.

Harmful Visual Content Detection System for Social Media Platforms

“Harmful content on social media platforms poses a significant threat to user safety, especially when such content includes violent, abusive, or inappropriate visuals that bypass traditional moderation filters. Current content moderation systems often fail to detect nuanced harmful visuals, such as subtle gestures or weapons shown in harmless contexts. Moreover, most existing approaches focus on text-based filtering or rely on basic object detection, which struggle to understand the real context behind the images and videos shared online. This research addresses the pressing need for a more intelligent and context-aware detection system that can help minimize the spread of harmful visuals in real time.
To solve this issue, a hybrid AI-based system was developed that integrates YOLOv8 for object detection with a vision-language model to analyze visual context and determine the
harmfulness of content. The system detects harmful categories such as alcohol, blood,
cigarettes, guns, knives, and insulting gestures in both images and videos. It then classifies the content as harmful or non-harmful based on the scenario. An automatic alert mechanism is also implemented to notify administrators via email when harmful content is detected. The backend was built with Flask, and a user-friendly interface was provided for seamless interaction and visualization of results.
The system demonstrated strong performance with high detection accuracy across various test cases, including challenging scenarios with small object sizes, low lighting, and multiple object categories. Evaluation results showed high precision and recall values, and experts praised the contextual understanding achieved by combining object detection with language reasoning.
The model successfully flagged harmful content and generated contextual justifications,
offering a practical solution for enhancing safety on social media platforms. These results
indicate that the proposed system is both effective and scalable for real-time harmful visual
content moderation.”

DEETECTOR

In the evolving technological landscape, the rise in the number of deepfake videos has risen by a marginally great amount. Deepfake videos are videos which are created using digital software, machine learning and face swapping. Deepfakes are artificially generated videos in which images are combined to create events and statements that never happened or never had been said. This brings our attention to the need for useful detection techniques that can tell whether videos are authentic or if they have been artificially generated using AI.
Typically, a majority of deepfake videos are widespread using mobile applications such as WhatsApp, Facebook, Telegram and a variety of other mobile phone applications. This brings to light the problem of individuals not being able to differentiate between real videos and Deepfake videos which creates a need for detection software which people can directly use on their phones.
The approach the author proposes, an multimodal based approach where the video modality consists of a distilled Vision Transformer model and the video modality consists of a modified CNN light-weight architecture.
The research also looks at the feature extraction methods that are optimal for lightweight audio deepfake detection.

End-to-End Sign Language Recognition Pipeline

“Sign Language Recognition (SLR) plays a crucial role in enhancing communication accessibility for deaf and hard-of-hearing communities. This paper introduces an energy-efficient, end-to-end SLR pipeline optimized for real-time edge deployment. Our approach
centers on a novel hybrid architecture that integrates MaskedConv1D layers with a Bidirectional Long Short-Term Memory (BiLSTM) network, further enhanced by an attention mechanism to effectively extract and leverage spatio-temporal features from sign gesture sequences. The pipeline incorporates a robust preprocessing module utilizing MediaPipe-based landmark extraction and a selective temporal sampling strategy, which together reduce input redundancy while preserving critical gesture dynamics. Additionally, a lightweight, prompt-driven language model is employed for on-the-fly grammatical correction and translation, ensuring high semantic fidelity under computational constraints. Experimental evaluations on the SSL400 dataset demonstrate competitive classification accuracy, low computational overhead, and high inference speed, making the system well-suited for resource-limited edge devices. These contributions provide a scalable foundation for practical SLR applications and highlight future enhancements in multimodal fusion and on-device language processing.”

Towards Explainable and Occlusion Aware Crowd Anomaly Detection

Crowd anomaly detection is a critical research area that addresses the growing need for ensuring safety and security in densely populated urban environments. Traditional CCTV-based surveillance systems often struggle with real-time detection of suspicious activities due to challenges such as occlusions, crowded scenes, and complex human behaviors. VUEBLOX project proposes an advanced, explainable, and occlusion-aware framework for robust crowd anomaly detection. The system integrates multiple deep learning modules, including Masked Autoencoders (MAE) to handle occlusions by reconstructing partially visible objects, Graph Neural Networks (GNN) to capture intricate spatial relationships, and SimCLR for contrastive feature representation. Furthermore, the model employs an ensemble voting mechanism to aggregate outputs from different modules and improve anomaly detection accuracy.

Explainability is a key focus of this framework, achieved through techniques such as heatmap visualizations and graph-based reasoning, which provide insights into the decision-making process and enhance user trust. The system was trained and evaluated using benchmark datasets like UCSD Ped1 and Ped2, demonstrating high detection accuracy and robustness under occluded scenarios. Results showed that integrating occlusion-aware modules significantly improved the model’s performance compared to conventional methods. This research contributes to the field of intelligent surveillance by offering a reliable and interpretable solution that bridges the gap between deep learning advancements and practical deployment in real-world public safety applications.

ZenSearch: Revolutionizing E-Commerce Search through Advanced Multimodal Integration & Retrieval Techniques

The existing traditional e-commerce systems struggle to focus the user query to give a relevant product recommendation at the end of the retrieval stage where they mostly rely on unimodal approaches. This project explores this gap through developing an efficient multimodal retrieval system utilizing the ColPali architecture where the product images and captions are mapped effectively into a unified space to ultimately produce accurate and context-aware product recommendations.

Alz-InsightNet An Explainable Attention-Based Multimodal and Multimodel System for Early Alzheimer’s Detection

Alzheimer’s disease (AD) is a progressive and incurable neurological condition that presents major challenges for early-stage diagnosis. Conventional methods rely heavily on manual interpretation of MRI and PET scans, which can be subjective, time-consuming, and prone to error—often delaying timely intervention. This project addresses these limitations by developing Alz-InsightNet, an explainable, attention-based multimodal deep learning system for early detection of Alzheimer’s disease.

The system combines structural MRI and functional PET imaging data to improve diagnostic accuracy and support clinical decision-making. It employs two modified convolutional neural network (CNN) models—ResNet50 and DenseNet201—enhanced with Convolutional Block Attention Modules (CBAM) for MRI analysis, with their outputs integrated through an ensemble approach. For PET image classification, a CBAM-enhanced VGG-19 model is used. To foster clinical trust, the system incorporates multiple Explainable AI (XAI) techniques—Grad-CAM, Integrated Gradients, and LIME—that generate visual interpretations of the model’s predictions.

Alz-InsightNet demonstrated strong performance, achieving 98% accuracy with ResNet50, 95% with DenseNet201, 99.63% with the MRI ensemble, and 88% with the PET model. By fusing complementary imaging data and offering interpretable results, this system presents a practical and clinically relevant solution for enhancing the reliability and trustworthiness of early Alzheimer’s disease detection.

Advanced Patterned Fabric Defect Detection and Calculating the Defect Size using Explainable AI

This project presents an AI-based approach to detect and measure defects in multi-patterned fabrics using computer vision, deep learning, and Explainable AI (XAI). Traditional fabric inspection processes rely heavily on manual labour, which is often slow, inconsistent, and prone to error—especially when dealing with complex or coloured patterns. Furthermore, manual inspection typically lacks precise defect sizing, which is critical for fabric quality evaluation based on standards like the 4-point system.

To address these limitations, a deep learning model based on the Xception architecture was trained on a publicly available patterned fabric dataset to classify fabric patterns and detect common defect types such as holes, stains, and multiple defects. Grad-CAM, a popular XAI method, was used to generate visual explanations of the model’s predictions, improving transparency and user trust. These heatmaps also enabled the localisation and measurement of defects, which were further converted from pixel dimensions into real-world units using camera parameters including optical working distance and focal length.

The model achieved 0.91 training accuracy and 0.88 accuracy on both validation and test sets for defect detection, with precision, recall, and F1-score also at 0.88. Pattern classification reached 0.98 test accuracy. This integrated system not only automates defect detection and sizing but also improves interpretability, making it a reliable and scalable tool for quality control in modern textile manufacturing.