Earlham InstituteResearcher
Apr. 2022 - Oct. 2022Norwich, England, United KingdomProject on bioinformatic analysis of WGS omics data to transfer the metagenomic analysis pipeline from Nextflow to galaxy and to develop a machine learning model to advance the analysis. — This involves the use of statistical analysis of large datasets in GB's.
— Use of python to clean, analyse and visualize the data
— Implementation of both unsupervised and supervised machine learning algorithms to extract insights from the data. — Python libraries used: — — For data cleaning: Pandas, NumPy,
— — For data visualization: Matplotlib, Seaborn,
— — For Machine learning: Scikit-learn — R libraries used:
— — For data analysis: MaasLin2 — Analysis of Nextflow pipeline and to implement on Galaxy Web platform
— Basic use of HPC to submit tasks.
— Linux environment was used for all tasks.