Donald YAKAM - Senior Data Engineer/ Cloud Architect, Kaizen Analytix

Feedback from Donald’s ex-colleagues

Let me see

Keep in touch with meI'm using Intch to connect with new people. Use this link to open chat with me via Intch app

Work Background

Kaizen AnalytixSenior Data Engineer

Feb. 2025United States• Engineered scalable data pipelines on GCP that reduced processing costs by 20% while enhancing system reliability and maintain- ability. • Developed robust ETL workflows using Dataflow and BigQuery to manage high-volume, high-velocity datasets with efficient transformation logic. • Orchestrated end-to-end data workflows with Cloud Composer (Apache Airflow) to manage dependencies, automate pipeline execu- tion, and ensure data integrity. • Applied data validation and quality rules using Dataplex to enforce schema consistency, detect anomalies, and uphold data governance across zones. • Integrated and automated secure SFTP data transfers to ingest external datasets into GCS, establishing structured landing and staging zones. • Implemented data standardization logic to clean and promote data from Level 1 (raw) to Level 2 (validated) using defined business rules and transformation layers. • Optimized SQL performance by leveraging Common Table Expressions (CTEs) in BigQuery, resulting in modular and efficient query logic. • Utilized watermark columns for implementing incremental data loads, effectively tracking changes to reduce processing time during daily and hourly ingestions. • Migrated data pipelines from on-prem Teradata to Google BigQuery, optimizing queries and schema design to reduce costs and enhance performance. • Designed ETL processes to extract data from Teradata, apply business rules, and load data into GCP using Dataflow and Cloud Composer. • Enhanced Teradata SQL query performance and workload management through effective use of indexes, statistics, and partitioning strategies, achieving a 20% reduction in processing time. • Collaborated with cross-functional teams to define system architecture, improve operational efficiency, and ensure compliance with data governance policies.

The Home DepotStaff Data Engineer

Sep. 2024 - Jan. 2025Atlanta, Georgia, United States• Update the existing data pipeline • Implemented a GCP-based data analytics platform that processed terabytes of data daily, providing actionable insights that increased overall business efficiency by 20%. • Perform the ETL process on Dataflow and store data to Bigquery • Perform ETL process on dataproc and Databricks • Add and deploy new column to existing data pipeline architecture • Deploy data pipeline from github • Create and design a cloud data pipeline architecture and deploy it using github and Jenkins • Export data from big query to google cloud storage • Triggering Jenkins Jobs from GCP cloud scheduler • Optimize data workloads at a software level by improving processing efficiency. • Independently tackle problem statements and bring solutions to life. • Develop new data processing routes to remove redundancy or reduce transformation overhead. • Monitor and maintain existing data workflows. • Use monitoring and observability best practices to ensure pipeline performance remains high. • Perform complex transformations on both real time and batch data assets. • Review DLF task migration errors • Pushing Jenkins Job configuration : Jenkins DSL • Design and develop efficient ETL processes for data ingestion, integration, and analytics • Designed and implemented a data pipeline to ingest and process 10TB of data per day, resulting in a 30% reduction in processing time and enabling real-time data analysis. • Developed and implemented data quality metrics and automated data quality checks, resulting in a 50% reduction in data errors and improved data accuracy. • Collaborated with cross-functional teams to develop and maintain data governance policies and data security protocols, ensuring compliance with industry regulations and protecting sensitive data. • Using Atlas, MongoDB and GCP to deploy ETL process from anywhere • Integrate suite of cloud database and data services to accelerate and simplify how to build with data.

The Home DepotSenior Data Engineer

Jul. 2023 - Sep. 2024United States• Architected and migrated Home Depot’s Business-to-Retail data domains in BigQuery, ensuring a seamless project transition and zero impact on production analytics. • Designed and maintained over 50 incremental ETL pipelines in Dataflow and Dataform, processing 5 TB of daily transactions with a 25% improvement in job runtime. • Leveraged Pub/Sub and Dataflow for reliable messaging and batch processing, enabling real-time error handling and throughput monitoring. • Implemented advanced BigQuery optimizations including partitioning, clustering, and materialized views to reduce query costs by 30% and accelerate reporting. • Centralized metadata management and enforced data quality using Dataplex, automating schema validations and anomaly alerts. • Utilized Terraform alongside Dataplex to enforce data governance, lineage, and schema validations, enhancing overall data management practices. • Led the migration from BigQuery Business to BigQuery Retail, coordinating cross-functional testing to ensure 100% data fidelity. • Automated deployment pipelines using Terraform and Cloud Composer, reducing manual intervention in ETL releases by 80%. • Collaborated with product and analytics teams to define SLA-driven data requirements and translate them into technical specifications and tests. • Enforced rigorous data governance and security policies using IAM roles, VPC Service Controls, and encryption best practices. • Enhanced Dataform-based ELT pipelines with modular dbt models to separate staging, intermediate, and mart layers, increasing maintainability and traceability across 50+ pipelines. • Provided technical guidance during the Teradata sunset initiative, ensuring a successful migration of legacy dashboards and KPIs to GCP. • Partnered with product and analytics teams to define SLA-driven data requirements, translating them into technical specifications and tests that are critical for reliable data pipeline operations.

EpsilonData Engineer

Mar. 2022 - Mar. 2023United States• Designed and implemented scalable data architectures on AWS and Azure, leveraging Amazon Redshift and DynamoDB for optimized data retrieval and storage solutions. • Collaborated with data scientists to build and integrate machine learning models using Python (Pandas, NumPy, Matplotlib), improving data analysis and prediction capabilities. • Automated data workflows with Apache Airflow and Shell scripting, reducing manual intervention and increasing the reliability of data pipelines. • Implemented robust data security measures and ensured compliance with industry standards like GDPR and HIPAA, utilizing encryption protocols and access controls in AWS. • Automated data workflows using Apache Airflow and Shell scripting, reducing manual intervention by 50% and increasing pipeline reliability. • Implemented robust data security measures using industry standards like GDPR and HIPAA, utilizing encryption protocols and access controls in AWS, ensuring compliance and data protection. • Used Apache Hadoop and Spark for big data processing, handling large-scale data sets with HDFS and MapReduce, which enhanced data processing capabilities by 40%. • Conducted performance tuning and optimization of data storage solutions, including Apache Cassandra and Amazon Redshift, resulting in a 30% improvement in data query performance. • Mentored junior data engineers in data processing and cloud technologies, fostering a culture of continuous improvement and technical excellence within the team. • Design and implement data models, databases, and data warehouses for data storage and analysis in Azure synapse • Establish and maintain technical environment for data analysis, such as databases and data warehouses in Azure cloud environment • Perform ETL/ ELT pipeline in Azure environment using Azure Data Factory, Gen2 and Azure Sql • Perform Incremental Load using ADF and Store procedure and Synapse • Using trigger to trigger the pipeline daily bases

Albertsons CompaniesData Engineer / Application Developer

Mar. 2021 - Mar. 2022United States• Designed and maintained ETL pipelines using Python, Google Cloud Dataflow, and Apache Spark, reducing data processing time by 30%. Utilized SQL for data manipulation and optimized performance with Apache Cassandra and Amazon Redshift. • Migrated over 50 on-premises data pipelines to Google Cloud Platform (GCP), leveraging Apache Hadoop and Hive for data storage and processing, achieving a 25% cost reduction. • Developed and deployed microservices using Spring Boot and Docker on GCP, ensuring scalable and reliable application architecture. • Implemented RESTful and SOAP web services for data integration and exchange, using Java, J2EE, and XML technologies to enhance system interoperability. • Utilized Apache Kafka and Google Cloud Pub/Sub for efficient real-time data streaming and processing, ensuring seamless data flow and integration across systems. • Employed test-driven development (TDD) methodologies using JUnit and Mockito, ensuring high code quality and reliability through comprehensive unit and integration tests. • Conducted performance tuning of BigQuery SQL queries and Apache Hive, improving query performance by up to 50%, and utilized Matplotlib and D3 for data visualization. • Created and managed CI/CD pipelines using Google Cloud Build and Jenkins, accelerating deployment cycles by 15%. Implemented automated testing with JUnit and integration testing with Postman. • Utilized Google Cloud Pub/Sub for real-time data ingestion, handling over 1 TB of data daily, and processed data streams with Spark Streaming and Kafka for efficient data flow management. • Provided training and mentorship to a team of 5 junior data engineers, improving team productivity by 25%, and facilitated workshops on using GCP, Python, and machine learning frameworks such as Scikit-learn and TensorFlow.

Dell TechnologiesBig Data Engineer

Nov. 2018 - Nov. 2020• Designed and implemented robust data pipelines using Apache Spark and Kafka to process over 10TB of data daily, significantly improving data flow efficiency. • Optimized complex SQL queries and utilized Apache Cassandra for data storage, achieving a 30% reduction in data retrieval times and enhanced system performance. • Integrated diverse data sources into a unified data warehouse using Apache Hive and Apache Flume, streamlining data access for business intelligence. • Designed and deployed RESTful web services with J2EE and Spring Boot to facilitate seamless data exchange between different systems. • Developed interactive dashboards and visualizations using Tableau and D3.js, providing stakeholders with real-time insights into key business metrics. • Implemented robust data security measures using encryption and access controls on Amazon Redshift and DynamoDB to safeguard sensitive information. • Automated data extraction and transformation processes with Apache Oozie and Python scripting, improving workflow efficiency and reducing manual intervention. • Developed and maintained ETL processes with Python and Django, ensuring data integrity across Oracle11g, MySQL, and Amazon Redshift databases. • Collaborated with cross-functional teams using Agile methodologies to gather business requirements and translate them into scalable data solutions using J2EE and Hibernate. • Implemented data warehouse solutions on AWS Redshift and leveraged Apache HBase for improved data accessibility and query performance, reducing storage costs by 25%. • Conducted data validation and cleaning using Python libraries such as Panda and NumPy, resulting in a 40% improvement in data quality and reliability. • Deployed machine learning models with Python to automate anomaly detection, using Power BI for visualization and reducing manual monitoring efforts by 50%.

Kani Solutions IncPython Developer

Aug. 2016 - Nov. 2018Atlanta, Georgia, United States• Write effective, scalable code. • Develop back-end components to improve responsiveness and overall performance. • Integrate user-facing elements into applications. • Test and debug programs. • Debugging programs and integrating applications with third-party web services. • Improve functionality of existing systems. • Implement security and data protection solutions. • Assess and prioritize feature requests. • Coordinate with internal teams to understand user requirements and provide technical solutions. • Conducted scrum meetings and generated custom dashboards. Knowledge testing tools like bugzilla and jira. • Experience in using various version control systems like git, cvs, github, heroku and amazon ec2. • Having knowledge on aws lambda, auto scaling, cloud front, rds, route53, aws sns, sqs, ses. • Experience in reviewing python code for running the troubleshooting test-cases and bug issues. • Understanding Python files in openstack environment and make necessary changes if needed. • Involve in the development of the application using Python 3. 3, HTML5, CSS3, AJAX, JSon and JQuery. • Work with some Python Libraries like Pandas and Numpy to analyze data • Build and Use the API • Design and develop some complex python functions • Use some python Libraries like sqlAlchemy, Beautiful Soup to design web applications. • Design, develop, implement, and maintain enterprise GIS custom applications and data integration solutions. Develop data loading, processing, and managing automation with FME or Python script. Implement, administrate, and support ArcGIS Enterprise solutions. Responsible to building and maintaining three batch frameworks utilizing Autosys and Unix Korn shell ScrIpting • composed a basic Unix Workshop for some of the user, Coded and Created and implemented statistical reports which spanned across multiple databases Streamlined data fixes, databases programming and sql queries tuning, Rasing operational efficiency in the Organization by 35%,

NATCOMFull satck Developer

Sep. 2013 - Feb. 2015• Perform or direct web site updates. • Developed, tested and maintained HTML5-based accessible and mobile applications to deliver client content to devices without Flash support. • Developed a web-based application with HTML, JSP, Spring MVC and Hibernate frameworks. • Used XML based configuration to wire the dependency components together and to define bean classes. • Used Maven as a build tool, wrote the dependencies for the jars that needs to be migrated. • Used JSP, HTML5, CSS3, Ajax toolkit and JavaScript to design the UI. • Developed range bars, check boxes for filtering records using JQuery. • Implemented stored procedures and dynamic SQL on SQL Server. • Involved in creating basic SQL for CRUD operations and advanced SQL for procedures. • Designed and developed very complex and large web pages using HTML, CSS, jQuery and Bootstrap for Dynamic web pages.

ITGStoreDatabase Administrator

Jul. 2011 - Aug. 2013• Test programs or databases, correct errors and make necessary modifications. • Modify existing databases and database management systems or direct programmers and analysts to make changes. • Wrote SQL queries to retrieve data from the database using JDBC • Write and code logical and physical database descriptions and specify identifiers of database to management system or direct others in coding descriptions. • Train users and answer questions. • Specify users and users access levels for each segment of database. • Select and enter codes to monitor database performance and to create production database. • Establish and calculate optimum values for database parameters, using manuals and calculator. • Revise company definition of data as defined in data dictionary.

Multitech Center Java Developer

Jun. 2010 - Jul. 2011• Write, analyze, review, and rewrite programs, using workflow chart and diagram, and applying knowledge of computer capabilities, subject matter, and symbolic logic. • Correct errors by making appropriate changes and rechecking the program to ensure that the desired results are produced. • Used Spring and Hibernate Criteria API to query the database and perform other CRUD operations. • Write JSP and Servlets to add functionality to web application based on customer requirements • Use J2EE design patterns to create application, including utilizing EJB for business logic • Compile and write documentation of program development and subsequent revisions, inserting comments in the coded instructions so others can understand the program. • Assign, coordinate, and review work and activities of programming personnel. • Write or contribute to instructions or manuals to guide end users.

Feedback from Donald’s ex-colleagues

Requests