Getting Stuck on the Kaggle Disaster Tweets Project (and How I’m Shipping V1 Anyway)

I’ve been working on the Kaggle Disaster Tweets classification project, and for a while, progress felt good. I built a baseline model using TF-IDF and Logistic Regression and managed to get an F1 score of 0.82 without using a pipeline. Then I decided to “do things properly” and refactor everything into a scikit-learn pipeline — and … Read more

From Baseline to Submission: My Gradient Boosting Pipeline on Spaceship Titanic

Introduction After completing my first Kaggle competition on Housing Prices, I decided to tackle the Spaceship Titanic dataset. The goal is to predict whether passengers were transported to another dimension during the voyage. This competition has been a great opportunity to improve my workflow, learn about pipelines, and practice model evaluation. In this post, I’ll walk through … Read more

Improving My Baseline Model: From Simple Linear Regression to a Proper Pipeline

In my previous post, I built a simple baseline model for the House Prices Kaggle competition using only numerical features, scaling, and a linear model. Since then, I’ve iterated on that baseline by introducing a proper preprocessing pipeline, adding categorical features through one-hot encoding, and applying feature engineering, cross-validation, and hyperparameter tuning. The goal wasn’t … Read more