Shashwat Dhayade / Earth & Environmental Sciences/ Faculty Mentor: Yike Shen

Bioconcentration of contaminants in aquatic organisms poses environmental and public health risks as toxins accumulate in fish and transfer through the food chain. It depends on chemical properties and biological interactions, yet models often overlook key mechanisms, leading to inaccurate risk estimates. While QSAR-based classification tree models have been used, they lack the predictive power of modern machine learning techniques. We developed a Gradient Boosted Decision Tree (GBDT) model with different chemical features representations (i.e., physicochemical properties, molecular fingerprints including ECFP and MACCS, and RDKit-generated molecular descriptors) to classify fish bioconcentration mechanisms into three categories: (1) inert chemicals that accumulate in lipids, (2) chemicals interacting with tissues, and (3) chemicals undergoing metabolism or elimination. Our GBDT model achieves the best performance (accuracy 0.895, recall 0.826) with the physicochemical feature set, outperforming the Random Forest Model (accuracy 0.784, recall 0.631) and Logistic Regression Model (accuracy 0.849, recall 0.861). Other feature sets show lower performance: RDKit (accuracy 0.682, recall 0.554), ECFP (accuracy 0.644, recall 0.472), and MACCS (accuracy 0.632, recall 0.469), highlighting the effectiveness of physicochemical features. Our results demonstrate the utility of classifying bioconcentration mechanisms through machine learning models and representative features.

Poster

Video Presentation