Sensory, Inc.Data Engineer
Apr. 2022 - Oct. 2023Took ownership of development, maintenance, and operation of the various pieces of the automated data pipeline.
Upgraded and deployed a Python-Flask app from Python 2.5 to Python 3.10 compatibility, moving from the host server to a Docker implementation on a Kubernetes cluster with the assistance of the lead system administrators. Increased customizability, reliability, and usability of the web application by designing a bespoke system for loading collection requirements using OOP in a redesigned front-end. Updated the web application’s back-end to comply with modern security standards, current versions of JavaScript, audio worklet nodes, and AWS S3. Improved the web-scraper’s query-generation functionality to allow for scraping of audio data and transcriptions in languages previously unavailable, as well as a 100-500% increase in high-quality data for other languages.
Expanded the web-scraper’s daily volume volume capabilities by 400% through implementation of AWS EC2 and S3, adding the option for scalability when required. Ensured continual uptime of the automated data vetting tool and performed verification of results.
Designed and developed solutions for cataloging data in a Postgres database, including architectures, custom functions, Python scripts for extracting data for specific needs, and tools to allow for the import of large quantities of data. Notified stakeholders of delivery quantities, important considerations, and licensing when pertinent. Ensured data was delivered to the proper location for internal use, with updated metadata, through the development of automation processes to alleviate manual intervention requirements. In addition to these responsibilities, various projects required converting old data to meet new standards, and working closely with management allowed me to involve myself in projects beyond the data domain, including development of demos, integration of our cloud technologies, and more at the request of leadership.