Data Engineering

Real-Time Ad Analytics Platform

A distributed analytics platform designed for ingesting, processing, and reporting user and advertisement event data with near real-time visibility.

System Architecture

Real-Time Ad Analytics Platform architecture

Problem Statement

Traditional batch reporting creates delays in campaign visibility and makes it difficult to monitor user interaction and ad performance in real time. This project was built to support scalable streaming ingestion and fast analytical processing for event-driven reporting.

Tech Stack

KafkaPySparkDockerPostgreSQLBigQuery

Key Contributions

  • Built a streaming data pipeline for ingesting user and advertisement event streams
  • Implemented PySpark-based processing for scalable transformations across streaming and batch workflows
  • Designed analytics-ready storage patterns to support performance reporting and campaign analysis

Results

  • Enabled near real-time processing of event data
  • Structured the pipeline for both streaming and batch analytics use cases
  • Improved reporting readiness through optimized transformation and storage design

Engineering Decisions

Challenges Faced

View GitHub Repository