Portfolio

Reddit Data Project (In Progress)

Exploring Reddit behavioral and textual data for personal data science research.

Learn more

Students Performance Prediction

In this project, I analyzed student exam performance using demographic and socioeconomic features such as gender, parental level of education, lunch type, test preparation, and race/ethnicity. I prepared the data with careful preprocessing, including appropriate encoding of categorical variables and applying Principal Component Analysis (PCA) for dimensionality reduction. I evaluated a range of regression models using cross-validation, including Linear Regression, Ridge, Lasso, Support Vector Regression (SVR), Random Forest, Gradient Boosting, and XGBoost. Finally, I performed hyperparameter tuning for Random Forest and XGBoost to improve model performance.

Learn more

Video Games Business Analysis

I analyzed global video games sales using Python. The project included data preparation and cleaning, EDA, and answering business-driven questions, for example: What game in Europe would be best to release based on the data from recent years? Does a higher number of releases on a platform lead to high sales?. I used p-values to assess statistical significance and visualized trends across regions and time to support business decisions in the gaming industry.

Learn more

Telco Customer Churn Prediction

I analyzed a telecom customer dataset to predict customer churn using machine learning. I performed data cleaning, feature engineering, and exploratory analysis before training classification models like Logistic Regression, Random Forest, and XGBoost. I then evaluated their performance using metrics such as ROC-AUC and F1-score, and interpreted feature importance to uncover key drivers of churn. The final model helps identify high-risk customers and offers insights for targeted retention strategies.

Learn more

Coffee EDA

Exploratory data analysis of a coffee quality dataset. The project includes data cleaning, preprocessing, and a variety of visualizations to uncover patterns and trends in the data. Multiple plotting techniques are used to illustrate relationships between variables, such as origin, processing method, and cup score.

Learn more

Airbnb Florence Data Analysis

Data analysis of Airbnb listings in Florence, Italy. The project uses SQL queries to extract insights from multiple joined tables and Python (Seaborn) for data visualization. Key findings are presented through visualizations exploring pricing of listings.

Learn more

Sample PhD Data Visualizations

A selection of data visualizations from my PhD research, created using Wolfram Mathematica. These plots illustrate theoretical model behavior, time evolution of scalar fields, and parameter space exploration in early Universe cosmology.

Learn more

PAULINA MICHALAK
PORTFOLIO

Reddit Data Project (In Progress)

Students Performance Prediction

Video Games Business Analysis

Telco Customer Churn Prediction

Coffee EDA

Airbnb Florence Data Analysis

Sample PhD Data Visualizations

Get in touch

Email

Phone

GitHub

LinkedIn

PAULINA MICHALAK PORTFOLIO

Reddit Data Project (In Progress)

Students Performance Prediction

Video Games Business Analysis

Telco Customer Churn Prediction

Coffee EDA

Airbnb Florence Data Analysis

Sample PhD Data Visualizations

Get in touch

Email

Phone

GitHub

LinkedIn

PAULINA MICHALAK
PORTFOLIO