Computer Vision
12 minutes
Improving Emotion Recognition through Active Learning
Written by
Chinar Movsisyan

Emotion recognition represents a significant area where computer vision has made an impactful contribution. Leveraging algorithms that process visual data, AI models are capable of identifying human emotional expressions in real-time. This technology finds diverse applications, from educational environments where it can assess students' engagement and emotional reactions to different teaching approaches. It even has applications in autonomous vehicles, where emotion recognition models are deployed to monitor drivers' facial expressions and vigilance. They do this by detecting signs of fatigue, distraction, or stress. Consequently, the system can either alert the driver or activate automatic safety protocols. But like any computer vision system, deploying emotion recognition AI in the real world comes with challenges and considerations when it comes to data quality, data curation, and model improvement. 

In this case study, we look at how Manot, an insight management platform for improving computer vision models, addresses the challenge of pinpointing situations where an emotion detection model incorrectly identifies the emotions being expressed. But before we get into the results of our experiments, let’s take a closer look at the issue of edge cases when using computer vision models to detect emotion. 

Identifying Edge Cases in Emotion Recognition

Emotion detection, just like other domains in AI, faces challenges in the development and deployment of computer vision systems in real-world scenarios. Foremost among these is the quality and diversity of the data used to train the models. The effectiveness of these models is directly linked to the variety and quality of their training data. However, in practice, these models often encounter scenarios in production that they were not trained to recognize. This could be due to limited training on the wide range of cultural contexts in emotional expression or insufficient data representing individuals with unique facial features, often caused by medical conditions. These rare or unexpected scenarios, termed as edge cases, demand rigorous monitoring, data curation, and techniques like active learning.

The need to address edge cases in emotion recognition extends beyond technical requirements; it is also crucial for ensuring ethical and effective AI deployment. The accuracy of emotion recognition systems has profound implications, especially when these systems are used in critical applications like driver monitoring in autonomous vehicles, patient care in healthcare settings, or student engagement in educational technology. Failure to accurately recognize emotions in diverse scenarios can lead to misinterpretations with serious consequences. For example, in healthcare, misinterpreting a patient’s facial expressions might result in inadequate pain management or poor patient outcomes, while in education, failing to recognize the subtle emotional cues of students from diverse cultural backgrounds could render teaching strategies ineffective.

Addressing these challenges, however, are difficult. Firstly, the inherent diversity of human expressions across different cultures, ages, and contexts adds a layer of complexity. Emotional expressions are not universal; they vary significantly based on cultural norms and individual differences. Capturing and accurately interpreting this vast range of expressions requires a substantial, diverse dataset, which is often difficult to compile and annotate accurately. Additionally, the subtlety of emotional expressions poses a technical challenge. Unlike overt expressions, nuanced or mixed emotions are harder for AI models to detect and classify correctly.

Manot's Method for Identifying Edge Cases

Building on the complexities and challenges of handling edge cases in emotion recognition, it becomes crucial to explore effective methodologies for improving model performance in these scenarios. The typical approach adopted by AI companies involves a process where an AI team develops a model, which is then evaluated using a test dataset distinct from its training data. However, this test dataset is often unable to encompass all the scenarios the model will face in real-world applications, leading to gaps in the model's evaluation. To mitigate this, AI teams frequently engage in a cycle of amassing vast amounts of data, which is then incorporated back into the model to account for real-world scenarios that were initially overlooked. This approach, while common, tends to be costly and inefficient, resulting in extended feedback loops and prolonging the timeframe needed to refine and redeploy the models.

Manot, however, offers an innovative alternative to this prevalent approach. Our initial step involves closely examining a small subset of test data, alongside the model’s predictions and their corresponding ground truths. This examination is crucial for analyzing and identifying patterns where the model is prone to false positives or negatives. With a clear understanding of the model’s weaknesses, Manot then curates tailored insights for users, pinpointing specific images where the model demonstrates subpar performance. These insights are drawn from our extensive data lake, which boasts over one billion annotated images. By providing these tailored insights, Manot equips the AI development team with the necessary resources to fine-tune the model, ensuring its robust performance in scenarios where it previously faltered.

To take a closer look at how Manot works, let’s look at the results of an experiment we conducted using our system to improve the accuracy of an emotion detection model. 

Experiment Overview

Our experiment leveraged the Real-world Affective Faces (RAF-DB) dataset, a large-scale facial expression database featuring approximately 12,000 training samples and 3000 testing samples. These images were categorized into seven emotional classes: surprised, fearful, disgusted, happy, sad, angry, and neutral. For our model, we utilized the DAN architecture which is the current state-of-the-art in emotion recognition tasks. 

Before running Manot’s algorithm, the DAN model was initially trained on a select subset of the RAF-DB training images. Manot is setup by taking a small subset of images with ground truths as input, and the model’s performance on it. In this experiment, to benchmark our approach, we comprised an evaluation set made up of images randomly sampled from RAF-DB’s 3000 test set. The Manot algorithm, upon execution, selected images it predicted the model would perform most poorly on. 

We compared this set’s performance against the overall Manot evaluation set, random subsets of equivalent size, and a model entropy-based selection.

                                                                Manot scoring the impact of each mispredicted data sample on the model’s overall performance

Results

  • The model exhibited an accuracy of 68.7% on the entire Manot evaluation set.
  • Accuracy dropped to 49.1% on the Manot-selected subset of edge cases (insights).
  • For randomly selected subsets of the same size, the model maintained an accuracy of 68.8%.
  • Interestingly, when the model itself selected subsets based on entropy, the accuracy was 47.7%, showcasing the effectiveness of the Manot algorithm in closely mirroring entropy-based selection without direct model involvement.

The Importance of Active Learning

Now that we know where our model fails it is time to speak about active learning. Active learning is a machine learning approach where the algorithm selectively queries data samples to label the most informative or uncertain data points. Unlike traditional learning methods that train on a randomly selected dataset, active learning focuses on using fewer but more relevant data samples. This approach is particularly useful in computer vision tasks where labeling large datasets is labor-intensive. By prioritizing data points the model finds ambiguous or challenging, active learning aims to improve model accuracy more efficiently, enabling faster and more effective learning from a smaller, well-curated dataset.

The insights provided by Manot work similarly to the data samples queried during active learning mechanisms, however, there is an important distinction. Manot’s insights retrieve false positive and false negative scenarios for the model, as well as new categories/classes, which can be data samples that are completely novel to the model and the task. 

To see how effective Manot’s insights can be, we ran an active learning experiment on the insights that Manot retrieved for our DAN emotion recognition model. 

Active Learning Integration with Emotion Recognition Model

To see how effective Manot is for active learning tasks, we began by training the DAN model on 700 randomly selected samples from RAF-DB. The overall accuracy of the model was 72.7%, with particular weakness in classifying fear and disgust emotions. To improve this, we incorporated 700 'insight images' identified by Manot into the training set and retrained the model. This not only enhanced the overall performance but also significantly improved accuracy in the previously underperforming categories of 'fear' and 'disgust'. For comparison, a random selection (without Manot’s insights) and subsequent retraining yielded a marginal increase in overall accuracy, but without the marked improvement in the specific classes observed with Manot’s selection.

This experiment underscores the value of Manot’s insights, particularly in enhancing model performance in areas of weakness. The success of this approach in our active learning experiment highlights the potential of Manot's tools in optimizing machine learning models, especially in complex tasks like image classification. This case study concludes with a strong affirmation of the capabilities of our tools, as evidenced by the tangible improvements in model accuracy.

Conclusion

In concluding our examination of AI in emotion recognition, the pivotal role of Manot in addressing the field's complexities becomes clear. This journey underscores the challenges inherent in developing AI models capable of interpreting human emotions accurately, especially in diverse, real-world scenarios.

Our case study highlighted Manot's innovative approach to identifying and addressing edge cases in emotion recognition. These are the rare but critical scenarios where standard AI models often falter. Manot's method, by pinpointing these weaknesses, has shown effectiveness in improving the accuracy of AI models, as evidenced in our experiments with the Real-world Affective Faces dataset and the DAN model. The active learning experiment further demonstrated the value of integrating Manot's insights into the training process. This led to a significant improvement in the model's performance, particularly in recognizing complex emotions like fear and disgust.

In essence, Manot's contribution to emotion recognition AI is significant. It emphasizes the necessity for continuous improvement and adaptation of AI models to handle the intricacies of human emotional expression effectively. This case study leaves us with a clear message: in the evolving landscape of AI and computer vision, approaches like Manot's are crucial for creating more responsive and accurate technologies.

Stay up to date !

Subscribe to our newsletter to get inbox notifications.
Sign up to our newsletter ↓
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.