Member-only story

Using Chemical Language transformer model for Molecular Property prediction regression : Part 2

Abhik Seal
5 min readAug 11, 2023

--

In the previous blog we used a transformers library to predict continuous variables now this post is further extending the code to utilize other feature descriptors for your model , like fingerprints , this post shows how to combine a pre-trained Roberta transformer (specifically, ChemBERTa) with Extended Connectivity Fingerprints (ECFP6) to predict a continuous target variable related to a molecule’s properties.

ChemBERTaWithECFP6 class: This is the main model class that combines ChemBERTa with ECFP6 features. Let’s break down its components:

a. init method: Initializes the model components, including ChemBERTa, dropout layers, linear projection layers, ecfp6_projection_layer and the dense and output layers.

b. forward method: Defines the forward pass of the model, which includes:

  • Roberta Encoding: It takes the input token IDs and attention masks, and passes them through the Roberta model, resulting in a sequence of hidden states.
  • Pooling and Projection: Averages the hidden states across the sequence length dimension, applies linear projections to both the averaged hidden states and the ECFP6 features, bringing them to the same size (512 in this case) one can extend it to 1024, 2048, etc. The pooled hidden states and the ECFP6 features are separately passed through linear layers, reducing or transforming their dimensions to a common size (in this…

--

--

Abhik Seal
Abhik Seal

Written by Abhik Seal

Data Science / Cheminformatician x-AbbVie , I try to make complicated things looks easier and understandable www.linkedin.com/in/abseal/

No responses yet