Dirk Van Curan II

Data Scientist · (970) 376-3522 · dirkvancuran@gmail.com

I am a Data Scientist. I have a passion for solving big problems through data.


Projects

Football Player Prediction

Determining football player archetypes through machine learning.

The mission of this project is to empower both coaches and players to develop better team play by understanding archetypes of playing styles and skills rather than a physical attribute approach to positional play. Taking a skill first approach will help coaches formulate better teams and engage players on how to model their game and work on both strenghts and less develop skills. Providing a full scale solution from data collection, modeling, visuals, and drills based on the findings we can help push American football into a new era of football analytics to compete on the world stage.

This initial version of a the model uses FIFA 19 data to build player archetypes based on skill rather than traditional positional play and then assigns a player to a category based on the inputs of the skill categories. Further steps will be taken to make the app more approachable and developed with youth programs in mind to bring the use of data analysis to youth clubs.

King County Housing: Price Predictions

Determining housing prices by discovering the key contributing factors.

Our approach to this model was to discover key variables that can help accurately determine sale price of a home in King County. Our first task was to see what sorts of questions we could answer to solve business problems with data that was present in the raw data. Secondly, we cleaned the data, eliminated any features that did not help solve our business propositions and add any that gave better insight into what factors affect price. Once we had our initial model we tweaked features to fit our model to our split training data and test on the test data for more accurate predictions. Our end goal was to minimize error as much as possible, provide insights on what affects price, and develop a strategy around solving business questions posed.

Our model, though not perfect, helps show the various factors that can affect house price in King County. Our root mean square error on that price is just over $130k for both the linear regression and through the lasso function meaning we can expect a variance in price to that amount when predicting a sale price for a home. By using data that takes into account data that spans more years, has more data from the more rural zipcodes for King County, and/or potentially grouping zipcodes into bins to help see the price based on an area versus arbitrary zipcodes may help make the model better. Our presentation mentions some next steps we would take given the time and tools to help improve the model but feel through our current iteration we could certainly make some business decisions based on our findings.

Zillow Housing Market Analysis

Investigating real estate investment markets using time series analysis.

This project was an effort to use historical Zillow housing data from zipcodes across the US to analyze markets and their market potential for investment. Using Facebook Prophet to predict the ROI range after five years for all the zipcodes in the data set, we then gave our top five zipcodes for highest potential ROI, expected potential ROI, and lowest risk ROI based on the results.

Our model found several unexpected results but, after cross referencing with actual current data, was making reasonable decisions based on all the sales data of hundereds of thousands of US entries. Understanding our business problem of suggesting markets to banks or property investors we categorized to help offer a varied portfolio of properties to ensure recouped investment.

Writings

Radar Plots