EmotiEffLib
Emotion Efficient Library
|
EmotiEffLib (Emotion Efficient Library) is a lightweight library for emotion and engagement recognition in photos and videos. It can be used in Python and C++. It provides flexibility with backend support for Pytorch and ONNX, enabling efficient real-time analysis across various platforms.
This repository contains two implementations of EmotiEffLib: Python and C++.
Full documentation is available here.
Detailed building and installing instruction provided in the pages related to each library: Python and C++.
Detailed examples of using the Python and C++ modules are provided in the Tutorials.
If you want to run EmotiEffCppLib then prepare the models for inference with C++ library:
Also, in the folder training_and_examples you can find a number of examples of usage our models and training process. This folder also contains an example of mobile application for recognizing user emotions.
In order to run our code on the datasets, please prepare them firstly using our TensorFlow notebooks: train_emotions.ipynb, AFEW_train.ipynb and VGAF_train.ipynb.
NOTE!!! The models were updated so that they should work with timm library of version 0.9.*. However, for v0.1 version, please be sure that EfficientNet models for PyTorch are based on old timm 0.4.5 package, so that exactly this version should be installed by the following command:
All the models were pre-trained for face identification task using VGGFace2 dataset. In order to train PyTorch models, SAM code was borrowed.
We upload several models that obtained the state-of-the-art results for AffectNet dataset. The facial features extracted by these models lead to the state-of-the-art accuracy of face-only models on video datasets from EmotiW 2019, 2020 challenges: AFEW (Acted Facial Expression In The Wild), VGAF (Video level Group AFfect), EngageWild; and ABAW CVPR 2022 and ECCV 2022 challenges: Learning from Synthetic Data (LSD) and Multi-task Learning (MTL).
Here are the performance metrics (accuracy on AffectNet, AFEW and VGAF), F1-score on LSD, on the validation sets of the above-mentioned datasets and the mean inference time for our models on Samsung Fold 3 device with Qualcomm 888 CPU and Android 12:
Model | AffectNet (8 classes) | AffectNet (7 classes) | AFEW | VGAF | LSD | MTL | Inference time, ms | Model size, MB |
---|---|---|---|---|---|---|---|---|
mobilenet_7.h5 | - | 64.71 | 55.35 | 68.92 | - | 1.099 | 16 ± 5 | 14 |
enet_b0_8_best_afew.pt | 60.95 | 64.63 | 59.89 | 66.80 | 59.32 | 1.110 | 59 ± 26 | 16 |
enet_b0_8_best_vgaf.pt | 61.32 | 64.57 | 55.14 | 68.29 | 59.72 | 1.123 | 59 ± 26 | 16 |
enet_b0_8_va_mtl.pt | 61.93 | 64.94 | 56.73 | 66.58 | 60.94 | 1.276 | 60 ± 32 | 16 |
enet_b0_7.pt | - | 65.74 | 56.99 | 65.18 | - | 1.111 | 59 ± 26 | 16 |
enet_b2_7.pt | - | 66.34 | 59.63 | 69.84 | - | 1.134 | 191 ± 18 | 30 |
enet_b2_8.pt | 63.03 | 66.29 | 57.78 | 70.23 | 52.06 | 1.147 | 191 ± 18 | 30 |
enet_b2_8_best.pt | 63.125 | 66.51 | 56.73 | 71.12 | - | - | 191 ± 18 | 30 |
Please note, that we report the accuracies for AFEW and VGAF only on the subsets, in which MTCNN detects facial regions. The code contains also computation of overall accuracy on the complete testing set, which is slightly lower due to the absence of faces or failed face detection.
If you use our models, please cite the following papers:
The code of EmotiEffLib Python Library is released under the Apache-2.0 License. There is no limitation for both academic and commercial usage.