GenpactData Scientist
Jul. 2021 - May. 2023IndiaWorking on integrates OCR, automation, NLP, and ML technologies to extract and code data from source documents for multiple clients. Built the MHRA covid vaccine monitoring PVAI solution used to track vaccine adverse reactions. Nominated as part of team for the GROW2022 award for the same.
Built the module for Bayer which predicts three key fields. The modelling used were classification and NER. Improved the F1 score & Accuracy of the model over time across all source types.
Built encoder for different source types using AWS Textract which resulted in breadth wise score improvements for all fields.
Built multi-entity ner model which extracts three primary fields using SpanBert and HuggingFace.
Migration of native data preprocessing, training and deployment workflows to AWS using Sagemaker and Lambda. TechStack: Python, Tensor flow, Pytorch, Mlflow, Keras, OpenCV, Hugging Face, SQL, Docker, AWS Sagemaker, Linear model, Tree based model, LSTM, BERT (BioBERT, SpanBERT), Transformers, BitBucket, Jira, Linux.