Distributed Analytics SolutionsData Engineer intern
May. 2022London Area, United KingdomTeckton-2: Extending data pipelines and workflows for real-time data - A data pipeline framework for real-time data to design and implement high-quality solutions in python, the critical stakeholders, and facilitators. It optimizes/the persistence and ingestion layer by CI/CD pipeline framework following the agile methodology, Workflow, and Automation in the Agile and DevOps methodology. - Developing transformational solutions and delivering automated scripts using python and versioning the codes using source control management Git in GitHub.
- Data Engineering Infrastructure exploration, Implemented Data Ingestion Workflow using the Apache Airflow, Data Storage metrics and visualization, Implementing Data Audit and metrics suite, Data Orchestration, and deploy continuous data workflow ingestion layers.
- Extracts data from a webpage of slowly changing dimension JSON data, and XML file and parsed and preprocess the file to push large complex data to the database using Airflow, Schedule jobs, Monitoring and tuning performance in the development cycle. - Strong attention to detail in Extracting and processing the data from the different APIs and deploying ETL data pipelines-load it in the SQL and NoSQL databases—exploratory data analysis on the datasets and visualization using the PowerBI, Understanding of Test-Driven Development.
- Persistent problem solver and proven track record of investigating and fixing issues in the environment. Also writing clean, effective, reusable codes using Git.