BarclaysBig Data Developer
Aug. 2015 - Mar. 2017Prague, The Capital, Czech RepublicRoles: Big Data Engineer, Spark developer and mentor Responsibilities: Creating and documenting best practices in big data development across the corporate and investment banking, driving cluster and technology requirements, development of mission critical projects Achievements:
• Designed and developed fraud and money laundering detection system, currently in UAT. Due to lack of training data for machine learning models designed a rule based engine (which uses statistical models for detection of suspicious transactions), ML model and a feedback loop using Jira for feeding verified suspicious transactions (detected by either ML or the rules engine) back into the training data.
• Solely responsible for end-to-end development (including data ETL from a mainframe system, development of statistical analyses jobs in Spark and export of outputs into a RDBMS) of Smart Collections application.
• Designed and developed Metis, the consolidated Spark Data Library. Metis is a library for data discovery across multiple hadoop/spark clusters, simple data access and schema validation, currently in deployed across numerous Dev, UAT and Prod clusters.
• Development of Data Navigator - a web application for exploring and documenting schemas of data stored in hadoop clusters.
• Responsible for running and organising interactive spark and machine learning training sessions using the edX Spark courses.
• Designed and built a lab environment (pure Apache stack hadoop & latest Apache Spark, R + Python playground, OpenNebula virtualisation) for the data science initiative