Data Analysis Projects

Exploratory Analysis of US Data Careers

This project serves as a guide to exploratory data analysis of a dataset about data-related careers using R programming language. The result is an exploration of my own career interests with the intent of giving the reader ideas of how to perform their own analysis based on their personal career preferences.

 

Predicting Seminal Quality with Random Forest Classification

Dealing with Unbalanced Data

This project aims to contribute to the identification of male fertility problems initially explored in Gil et al. (2013) by using a random forest method for classification prediction modeling. It compares methods of sampling for testing and training datasets to increase specificity and sensitivity in classifying seminal quality. Down-sampled random forest may be most appropriate for real world situations if the goal is to flag possible abnormal sperm quality for further examination and testing, as the rate of false negatives is lowest using this approach.