Projects

Weather Forecast Data ETL with Dockerized Microservices

Weather Forecast Data ETL & Visualization with Dockerized Microservice

This project demonstrates a full end-to-end weather data pipeline using modern data engineering tools. Weather forecast data for New York (including Timestamp, Temperature °C, and Precipitation Probability) for the previous 7 days and upcoming 7 days is fetched from the Open-Meteo API. The workflow is orchestrated with Apache Airflow, where a Kafka Producer extracts data and a Kafka Consumer inserts it into a PostgreSQL database. The data is then served through a FastAPI web application for real-time visualization. The entire system is containerized using Docker, running across 8 containers, providing a scalable and reproducible data pipeline. The initial run of the workflow takes approximately 4 minutes, with data storage being the most time-consuming step.

Skills: Docker, Apache Airflow, Kafka, PostgreSQL, FastAPI, Jinja2

YouTube Song Analysis

Case Study: YouTube Song Analysis

This case study explores the relationship between song characteristics (like mood, popularity, and licensing) across YouTube. Through data analysis, it reveals that happy songs tend to receive more engagement, and not all official music videos are properly licensed. The insights help content creators, marketers, and platforms optimize their strategies for better audience reach and compliance.

Skills: Python (pandas, matplotlib, seaborn)

Case Study

Predicting Spotify Song Popularity

This project explores the prediction of Spotify song popularity using audio features and metadata, focusing on genre trends, song duration, and track name characteristics. Through data cleaning, exploratory analysis, and multiple machine learning models, the Random Forest Regressor emerged as the most accurate predictor. Key insights revealed that genre and duration have a stronger impact on popularity than track title length, offering valuable direction for music marketing and curation.

Skills: Python (pandas, matplotlib, seaborn, sklearn), Tableau