AiCoreSoftware/Data Engineer
Sep. 2022 - Feb. 2023Pinterest-Data-Pipeline-Project
• Developed an end-to-end processing in Python based on pinterest’s experiment processing pipeline
• Implemented based on Lambda architecture to take advantage of both batch and stream-processing
• Technologies used: Kafka, HBase, Presto, FastAPI, Spark, PostgresSQL, Prometheus and Grafana Formula1-Data-Processing-with-Azure-Databricks
• Developed a robust and scalable Formula1 Data processing pipeline from Ergast API using Azure Data Engineering tech stack
• Performed ELT operations scripted with databricks notebook and databricks compute clusters
• Technologies used: Azure Databricks, Databricks notebook and clusters,
Pyspark, Azure Datalake Gen-2 storage, Azure Data Factory, PowerBI Data Collection Pipeline
• Developed a module that scraped data from various sources using Selenium
• Performed unit testing and integration testing on the application to ensure that the package published to Pypi is working as expected
• Used Docker to containerise the application and deployed it to an EC2 instance
• Set up a CI/CD pipeline using GitHub Actions to push a new Docker image