General AssemblyData Science Fellow
Oct. 2022 - Jan. 2023RemoteCreated robust predictive models with statistics and Python programming. Built confidence and credibility to tackle complex machine learning problems on the job. Dec 2022 – Jan 2023 Cash Flow: Linear Regression and Classification to Identify Profitable Residential Real Estate Properties in California. (Capstone Project for Data Science Bootcamp) - Unlike most Real Estate Data Science Projects, I didn’t target sales price, but instead, CASH FLOW. - Personally created a large dataset from over 6 sources, over 20k rows, and 200+ individual columns. - Modeling: Ran both Regression and Classification Models. Data was lopsided so SMOTE and SMOTEEN were used for resampling. Dec 2022 Binary Classification Forecasting – Georgia State Recidivism - Brier Score was used as the metric to optimize our models. Precision was also used in order to minimize false positives. However, the Brier score was prioritized due to limited scoring capabilities. - Modeling: Logistic Regression and a BayesSearchCV, BayesSearchCV over Random Forest Classifier, BayesSearchCV over XGBoost. XGBoost provided the best Brier Score overall, but Logistic Regression was used to show feature importance. Nov 2022 NLP with Sentiment Analysis – Classification Project using SubReddits - Tokenized text and used a Sentiment Intensity Analyzer for EDA. - Modeling: Lemmatized and CountVectorized before using Logistic Regression, Random Forest Classifier, and Multinomial Naïve Bayes. All models performed well, with Logistic Regression being a slight winner.