Reddit Data Project (In Progress)
Exploring Reddit behavioral and textual data for personal data science research.
PhD Student in University of Warsaw (Cosmology and Particle Physics)
Currently I'm looking to transition into the IT industry, ideally in a role related
to data analysis. My academic experience provided me with strong analytical skills,
advanced mathematical modeling, programming skills, and the ability to interpret and
visualize data.
Exploring Reddit behavioral and textual data for personal data science research.
In this project, I analyzed student exam performance using demographic and socioeconomic features such as gender, parental level of education, lunch type, test preparation, and race/ethnicity. I prepared the data with careful preprocessing, including appropriate encoding of categorical variables and applying Principal Component Analysis (PCA) for dimensionality reduction. I evaluated a range of regression models using cross-validation, including Linear Regression, Ridge, Lasso, Support Vector Regression (SVR), Random Forest, Gradient Boosting, and XGBoost. Finally, I performed hyperparameter tuning for Random Forest and XGBoost to improve model performance.
I analyzed global video games sales using Python. The project included data preparation and cleaning, EDA, and answering business-driven questions, for example: What game in Europe would be best to release based on the data from recent years? Does a higher number of releases on a platform lead to high sales?. I used p-values to assess statistical significance and visualized trends across regions and time to support business decisions in the gaming industry.
I analyzed a telecom customer dataset to predict customer churn using machine learning. I performed data cleaning, feature engineering, and exploratory analysis before training classification models like Logistic Regression, Random Forest, and XGBoost. I then evaluated their performance using metrics such as ROC-AUC and F1-score, and interpreted feature importance to uncover key drivers of churn. The final model helps identify high-risk customers and offers insights for targeted retention strategies.
Exploratory data analysis of a coffee quality dataset. The project includes data cleaning, preprocessing, and a variety of visualizations to uncover patterns and trends in the data. Multiple plotting techniques are used to illustrate relationships between variables, such as origin, processing method, and cup score.
Data analysis of Airbnb listings in Florence, Italy. The project uses SQL queries to extract insights from multiple joined tables and Python (Seaborn) for data visualization. Key findings are presented through visualizations exploring pricing of listings.
A selection of data visualizations from my PhD research, created using Wolfram Mathematica. These plots illustrate theoretical model behavior, time evolution of scalar fields, and parameter space exploration in early Universe cosmology.