County of San DiegoSr. Data Architect
Dec. 2019 - Jan. 2024Greater San Diego Area• SRSP (Shallow Rent Subsidy Program) using predictive analytics (see below) to determine which customers are most at risk of becoming homeless, in order to subsidize their monthly rent.
• Bring Data Quality Profile activity into PowerBI for analytics against Data Quality Profile KPIs.
• Produce IDQ Informatica Data Quality Scorecards to notify when profile metrics exceed thresholds.
• Hook to various public web services in PowerBI to profile USPS address validation, email sending capability, etc.
o Introduce DG to a newly formed DG Committee, establishing a Vision Statement, Measurable Goals, Structure (Working Groups, Data Stewards, etc.), and initial DG Policies in the following DMBOK Wheel areas:
Data Quality (Dimensions: Completeness, Consistency, Integrity) using DQAF (Data Quality Analytics Framework)
Master Data (IBM MDM tool)
Data Modeling (Mega Hopex)
Data Analytics
Predictive Analytics
• Use SPSS Modeler and SPSS Statistics to build and test various predictive models (Random Trees, CHAID, A/S Tree, Bayes Network, etc.) to predict Homelessness in our customer base using various demographic and program participation independent variables as well as aggregate features and BINs built from base features.
• Use SPSS Analytics and SPSS Modeler to bring results of above models to create a weighted mean Risk Rating for Homelessness, then bring this feature and various demographic features into Kmeans and TwoStep-AS clustering model to create important demographic clusters of 55+ customers with high risk (Mean Risk Rating + 1 STD) of homelessness.
• Push above SPSS results into PowerBI to offer “What If” prescriptive analytics to determine probabilities of Homelessness based on demographic categories (location, mean median monthly income, etc.).
• Design SharePoint lists for Project Closure documents from DocVault, created a PowerApp to enter and validate data mined from the documents, and visualized generated KPIs via PowerBI.