TY - JOUR
T1 - A methodology for developing dermatological datasets
T2 - lessons from retrospective data collection for AI-based applications
AU - Pedro, Alma
AU - Romero, Pamela
AU - Vidaurre, Soledad
AU - Cabanas, Ana M.
AU - Galaz, Atsuko
AU - Hidalgo, Leonel
AU - Carrasco, Karina
AU - Tamez-Peña, José Gerardo
AU - Díaz-Domínguez, Ricardo
AU - Navarrete-Dechent, Cristian
AU - Mery, Domingo
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/12
Y1 - 2025/12
N2 - Purpose: The integration of artificial intelligence into dermatological research has underscored the need for robust and well-structured dermatological datasets. However, these datasets vary widely in their development processes, and there is currently no standard methodology to create such datasets. This work identifies three pressing needs for the building of dermatological datasets focus on skin tumor classification: the need for multimodal datasets, the definition of minimum metadata requirements, and the inclusion of underrepresented populations to address the scarcity of health data. Methods: We propose a practical methodology to create dermatological datasets from clinical records, incorporating both images and patient metadata. The process consists of four key stages: getting the institutional review board approval and analysis of clinical information sources, data recording and structuring, processing of clinical data and images, and quality assessment. This methodology was derived from hands-on experience in building two datasets from Chilean and Mexican populations, respectively. Results: The methodology allows the creation of well-structured datasets by simplifying data organization and enabling replication. Each step includes practical guidance for dealing with typical challenges, such as image metadata categorization and technical validation by dermatologists and computer scientists. Conclusion: Our contribution offers a reproducible, scalable, and interdisciplinary framework for creating dermatological datasets, especially useful for countries initiating dataset creation. In addition to the methodological proposal, we highlight common pitfalls and offer recommendations to mitigate them.
AB - Purpose: The integration of artificial intelligence into dermatological research has underscored the need for robust and well-structured dermatological datasets. However, these datasets vary widely in their development processes, and there is currently no standard methodology to create such datasets. This work identifies three pressing needs for the building of dermatological datasets focus on skin tumor classification: the need for multimodal datasets, the definition of minimum metadata requirements, and the inclusion of underrepresented populations to address the scarcity of health data. Methods: We propose a practical methodology to create dermatological datasets from clinical records, incorporating both images and patient metadata. The process consists of four key stages: getting the institutional review board approval and analysis of clinical information sources, data recording and structuring, processing of clinical data and images, and quality assessment. This methodology was derived from hands-on experience in building two datasets from Chilean and Mexican populations, respectively. Results: The methodology allows the creation of well-structured datasets by simplifying data organization and enabling replication. Each step includes practical guidance for dealing with typical challenges, such as image metadata categorization and technical validation by dermatologists and computer scientists. Conclusion: Our contribution offers a reproducible, scalable, and interdisciplinary framework for creating dermatological datasets, especially useful for countries initiating dataset creation. In addition to the methodological proposal, we highlight common pitfalls and offer recommendations to mitigate them.
KW - Clinical metadata
KW - Dataset methodology
KW - Dermatology
KW - Skin cancer
UR - https://www.scopus.com/pages/publications/105020993676
U2 - 10.1186/s12874-025-02706-y
DO - 10.1186/s12874-025-02706-y
M3 - Article
C2 - 41193978
AN - SCOPUS:105020993676
SN - 1471-2288
VL - 25
JO - BMC Medical Research Methodology
JF - BMC Medical Research Methodology
IS - 1
M1 - 251
ER -