TY - JOUR
T1 - Prediction of eye, hair and skin colour in Latin Americans
AU - Palmal, Sagnik
AU - Adhikari, Kaustubh
AU - Mendoza-Revilla, Javier
AU - Fuentes-Guajardo, Macarena
AU - Silva de Cerqueira, Caio Cesar
AU - Bonfante, Betty
AU - Chacón-Duque, Juan Camilo
AU - Sohail, Anood
AU - Hurtado, Malena
AU - Villegas, Valeria
AU - Granja, Vanessa
AU - Jaramillo, Claudia
AU - Arias, William
AU - Lozano, Rodrigo Barquera
AU - Everardo-Martínez, Paola
AU - Gómez-Valdés, Jorge
AU - Villamil-Ramírez, Hugo
AU - Hünemeier, Tábita
AU - Ramallo, Virginia
AU - Parolin, Maria Laura
AU - Gonzalez-José, Rolando
AU - Schüler-Faccini, Lavinia
AU - Bortolini, Maria Cátira
AU - Acuña-Alonzo, Victor
AU - Canizales-Quinteros, Samuel
AU - Gallo, Carla
AU - Poletti, Giovanni
AU - Bedoya, Gabriel
AU - Rothhammer, Francisco
AU - Balding, David
AU - Faux, Pierre
AU - Ruiz-Linares, Andrés
N1 - Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/7
Y1 - 2021/7
N2 - Here we evaluate the accuracy of prediction for eye, hair and skin pigmentation in a dataset of > 6500 individuals from Mexico, Colombia, Peru, Chile and Brazil (including genome-wide SNP data and quantitative/categorical pigmentation phenotypes - the CANDELA dataset CAN). We evaluated accuracy in relation to different analytical methods and various phenotypic predictors. As expected from statistical principles, we observe that quantitative traits are more sensitive to changes in the prediction models than categorical traits. We find that Random Forest or Linear Regression are generally the best performing methods. We also compare the prediction accuracy of SNP sets defined in the CAN dataset (including 56, 101 and 120 SNPs for eye, hair and skin colour prediction, respectively) to the well-established HIrisPlex-S SNP set (including 6, 22 and 36 SNPs for eye, hair and skin colour prediction respectively). When training prediction models on the CAN data, we observe remarkably similar performances for HIrisPlex-S and the larger CAN SNP sets for the prediction of hair (categorical) and eye (both categorical and quantitative), while the CAN sets outperform HIrisPlex-S for quantitative, but not for categorical skin pigmentation prediction. The performance of HIrisPlex-S, when models are trained in a world-wide sample (although consisting of 80% Europeans, https://hirisplex.erasmusmc.nl), is lower relative to training in the CAN data (particularly for hair and skin colour). Altogether, our observations are consistent with common variation of eye and hair colour having a relatively simple genetic architecture, which is well captured by HIrisPlex-S, even in admixed Latin Americans (with partial European ancestry). By contrast, since skin pigmentation is a more polygenic trait, accuracy is more sensitive to prediction SNP set size, although here this effect was only apparent for a quantitative measure of skin pigmentation. Our results support the use of HIrisPlex-S in the prediction of categorical pigmentation traits for forensic purposes in Latin America, while illustrating the impact of training datasets on its accuracy.
AB - Here we evaluate the accuracy of prediction for eye, hair and skin pigmentation in a dataset of > 6500 individuals from Mexico, Colombia, Peru, Chile and Brazil (including genome-wide SNP data and quantitative/categorical pigmentation phenotypes - the CANDELA dataset CAN). We evaluated accuracy in relation to different analytical methods and various phenotypic predictors. As expected from statistical principles, we observe that quantitative traits are more sensitive to changes in the prediction models than categorical traits. We find that Random Forest or Linear Regression are generally the best performing methods. We also compare the prediction accuracy of SNP sets defined in the CAN dataset (including 56, 101 and 120 SNPs for eye, hair and skin colour prediction, respectively) to the well-established HIrisPlex-S SNP set (including 6, 22 and 36 SNPs for eye, hair and skin colour prediction respectively). When training prediction models on the CAN data, we observe remarkably similar performances for HIrisPlex-S and the larger CAN SNP sets for the prediction of hair (categorical) and eye (both categorical and quantitative), while the CAN sets outperform HIrisPlex-S for quantitative, but not for categorical skin pigmentation prediction. The performance of HIrisPlex-S, when models are trained in a world-wide sample (although consisting of 80% Europeans, https://hirisplex.erasmusmc.nl), is lower relative to training in the CAN data (particularly for hair and skin colour). Altogether, our observations are consistent with common variation of eye and hair colour having a relatively simple genetic architecture, which is well captured by HIrisPlex-S, even in admixed Latin Americans (with partial European ancestry). By contrast, since skin pigmentation is a more polygenic trait, accuracy is more sensitive to prediction SNP set size, although here this effect was only apparent for a quantitative measure of skin pigmentation. Our results support the use of HIrisPlex-S in the prediction of categorical pigmentation traits for forensic purposes in Latin America, while illustrating the impact of training datasets on its accuracy.
KW - Admixture
KW - DNA phenotyping
KW - Eye-colour
KW - Hair-colour
KW - Latin Americans
KW - Pigmentation prediction
KW - Skin-colour
UR - https://www.scopus.com/pages/publications/85104129267
U2 - 10.1016/j.fsigen.2021.102517
DO - 10.1016/j.fsigen.2021.102517
M3 - Article
C2 - 33865096
AN - SCOPUS:85104129267
SN - 1872-4973
VL - 53
JO - Forensic Science International: Genetics
JF - Forensic Science International: Genetics
M1 - 102517
ER -