Authors
Muhammad Arslan Masood, Tianyu Cui, Samuel Kaski
Publication date
2024/9/19
Book
International Workshop on AI in Drug Discovery
Pages
149-159
Publisher
Springer Nature Switzerland
Description
In drug discovery, prioritizing compounds for testing is an important task. Active learning can assist in this endeavor by prioritizing molecules for label acquisition based on their estimated potential to enhance in-silico models. However, in specialized cases like toxicity modeling, limited dataset sizes can hinder effective training of modern neural networks for representation learning and to perform active learning. In this study, we leverage a transformer-based BERT model pretrained on millions of SMILES to perform active learning. Additionally, we explore different acquisition functions to assess their compatibility with pretrained BERT model. Our results demonstrate that pretrained models enhance active learning outcomes. Furthermore, we observe that active learning selects a higher proportion of positive compounds compared to random acquisition functions, an important advantage, especially in dealing with …
Scholar articles
MA Masood, T Cui, S Kaski - International Workshop on AI in Drug Discovery, 2024