TY - GEN
T1 - TrustSkin
T2 - 19th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2025
AU - Cabanas, Ana M.
AU - Pedro, Alma
AU - Mery, Domingo
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Understanding how facial affect analysis (FAA) systems perform across different demographic groups requires reliable measurement of sensitive attributes such as ancestry, often approximated by skin tone, which itself is highly influenced by lighting conditions. This study compares two objective skin tone classification methods: the widely used Individual Typology Angle (ITA) and a perceptually grounded alternative based on Lightness (L) and Hue (H). Using AffectNet and a MobileNetbased model, we assess fairness across skin tone groups defined by each method. Results reveal a severe underrepresentation of dark skin tones (∼ 2%), alongside fairness disparities in F1-score (up to 0.08) and TPR (up to 0.11) across groups. While ITA shows limitations due to its sensitivity to lighting, the H*-L* method yields more consistent subgrouping and enables clearer diagnostics through metrics such as Equal Opportunity. Grad-CAM analysis further highlights differences in model attention patterns by skin tone, suggesting variation in feature encoding. To support future mitigation efforts, we also propose a modular fairness-aware pipeline that integrates perceptual skin tone estimation, model interpretability, and fairness evaluation. These findings emphasize the relevance of skin tone measurement choices in fairness assessment and suggest that ITA-based evaluations may overlook disparities affecting darker-skinned individuals.
AB - Understanding how facial affect analysis (FAA) systems perform across different demographic groups requires reliable measurement of sensitive attributes such as ancestry, often approximated by skin tone, which itself is highly influenced by lighting conditions. This study compares two objective skin tone classification methods: the widely used Individual Typology Angle (ITA) and a perceptually grounded alternative based on Lightness (L) and Hue (H). Using AffectNet and a MobileNetbased model, we assess fairness across skin tone groups defined by each method. Results reveal a severe underrepresentation of dark skin tones (∼ 2%), alongside fairness disparities in F1-score (up to 0.08) and TPR (up to 0.11) across groups. While ITA shows limitations due to its sensitivity to lighting, the H*-L* method yields more consistent subgrouping and enables clearer diagnostics through metrics such as Equal Opportunity. Grad-CAM analysis further highlights differences in model attention patterns by skin tone, suggesting variation in feature encoding. To support future mitigation efforts, we also propose a modular fairness-aware pipeline that integrates perceptual skin tone estimation, model interpretability, and fairness evaluation. These findings emphasize the relevance of skin tone measurement choices in fairness assessment and suggest that ITA-based evaluations may overlook disparities affecting darker-skinned individuals.
UR - https://www.scopus.com/pages/publications/105014525191
U2 - 10.1109/FG61629.2025.11099364
DO - 10.1109/FG61629.2025.11099364
M3 - Conference contribution
AN - SCOPUS:105014525191
T3 - 2025 IEEE 19th International Conference on Automatic Face and Gesture Recognition, FG 2025
BT - 2025 IEEE 19th International Conference on Automatic Face and Gesture Recognition, FG 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 26 May 2025 through 30 May 2025
ER -