My Projects

 

1. SAT vs. ACT Participation

This project was my introduction to data science and data visualization. I looked at aggregate SAT and ACT scores along with participation rates in the years 2016 and 2017. Based on EDA and modeling, I found there was a negative relationship between test participation and test score. Please feel free to take a look at my Github here.

ACT composite Score Map.png

2. House Price Predictor

My task was to predict housing prices in Ames, Iowa using a large dataset containing over 80 features with a mix of categorical and numerical variables. I utilized feature interactions, EDA , multiple regression models to come up with the best predictions for the selling price of the house. Please feel free to review the project on GitHub here.

 

3. Political Subreddit Classifier

My goal was to create a model that was able to predict whether a reddit post originated from a liberal or conservative subreddit. I used natural language processing, sentiment analysis, and multiple classifier models during my project. My best model was a SVC, which had an accuracy score of around 70%. Please feel free to look at the project here.

Neg cons word count .png

Negative Conservative Word Count

I conducted a sentiment analysis on the text from the content and title from each reddit post. This graph shows the count of the appearance of each word on the X-axis of how often that word appeared in posts deemed negative by the analyzer.

 

For this project, my team and I were tasked with creating a webpage that was a one stop shop for all disaster related news for a specific disaster event. We used API’s to web-scrape tens of thousands of articles, created a search engine using Word2Vec, and created a local host website using Flask. Please feel free to view the project here.

4. Disaster Situational Awareness

This project was my most experimental endeavor, and while I didn’t get the result I wanted, I gained fantastic insights about missing data and unsupervised classification models. The intent of the project was to classify survey participants using a K-Means or DBSCAN algorithm based on an exit poll survey after the 2018 midterms. Please feel free to view my GitHub to see more.

5. Political Subgroup Classification

Political Parties.jpeg