My Projects
1. SAT vs. ACT Participation
This project was my introduction to data science and data visualization. I looked at aggregate SAT and ACT scores along with participation rates in the years 2016 and 2017. Based on EDA and modeling, I found there was a negative relationship between test participation and test score. Please feel free to take a look at my Github here.
2. House Price Predictor
My task was to predict housing prices in Ames, Iowa using a large dataset containing over 80 features with a mix of categorical and numerical variables. I utilized feature interactions, EDA , multiple regression models to come up with the best predictions for the selling price of the house. Please feel free to review the project on GitHub here.
3. Political Subreddit Classifier
My goal was to create a model that was able to predict whether a reddit post originated from a liberal or conservative subreddit. I used natural language processing, sentiment analysis, and multiple classifier models during my project. My best model was a SVC, which had an accuracy score of around 70%. Please feel free to look at the project here.
For this project, my team and I were tasked with creating a webpage that was a one stop shop for all disaster related news for a specific disaster event. We used API’s to web-scrape tens of thousands of articles, created a search engine using Word2Vec, and created a local host website using Flask. Please feel free to view the project here.
4. Disaster Situational Awareness
This project was my most experimental endeavor, and while I didn’t get the result I wanted, I gained fantastic insights about missing data and unsupervised classification models. The intent of the project was to classify survey participants using a K-Means or DBSCAN algorithm based on an exit poll survey after the 2018 midterms. Please feel free to view my GitHub to see more.