Gaurav Acharya

Building data systems
that scale.

MS in Computer Science candidate focused on data engineering, AI systems, and backend infrastructure.

I design real-time pipelines, machine learning workflows, and production-oriented backend systems using Python, Kafka, Spark, Docker, SQL, and modern data tooling.

PythonKafkaPySparkDockerSQLTensorFlow
Gaurav Acharya

Work

Featured Projects

Expertise

Data Engineering & System Design

Focused on building scalable data pipelines, real-time systems, and production-grade backend infrastructure.

Data Engineering

Apache KafkaApache Spark (PySpark)AirflowETL / ELT PipelinesBatch + Streaming Systems

Backend Systems

PythonJavaNode.jsREST APIsFlask

Databases & Storage

PostgreSQLMongoDBSnowflakeBigQueryElasticSearch

Cloud & DevOps

AWSDockerKubernetesGitHub ActionsCI/CD Pipelines

AI / ML Systems

TensorFlowPyTorchLangChainLLM PipelinesFeature Engineering

Background

Experience

Data Analyst

Jun 2022 — Jul 2024

Merkle Inc.

  • Built automated data pipelines and reporting workflows using Python and SQL, reducing manual reporting effort by 70%
  • Analyzed 10,000+ survey responses using Pandas and advanced Excel to generate actionable insights, improving decision-making by 15%
  • Developed interactive dashboards in Tableau and Power BI for real-time tracking of campaign performance
  • Integrated APIs (Qualtrics, Decipher, Confirmit) to streamline data collection and processing workflows
  • Implemented version-controlled workflows using Git and Docker, reducing production errors by 40%

Master's in Computer Science

2024 — 2026

Illinois Institute of Technology

  • Focused on Data Engineering, Distributed Systems, and Machine Learning
  • Built real-time data pipelines using Kafka and PySpark for scalable event processing
  • Developed machine learning systems for forecasting and classification using TensorFlow and Scikit-learn
  • Worked on backend systems and APIs using Flask and modern cloud tooling

Credentials

Certifications

Certifications reinforcing my foundation in data analytics, engineering systems, and applied problem solving.

GitHub

Selected Projects

real-time-ad-analytics

Kafka + PySpark pipeline for real-time event ingestion, processing, and scalable analytics.

KafkaPySparkDocker
Python • Updated Feb 2026
View Repository →

Online-Voting-System

Backend-driven voting system with authentication, vote handling, and result processing.

FlaskAuthSQL
ASP.NET • Updated Mar 2025
View Repository →

Lie-Detection-System

ML classification pipeline using feature engineering and behavioral data for prediction.

MLFeature Engineering
C++ • Updated Sep 2024
View Repository →

Dog-Breed-Prediction-System

CNN-based image classification system with Flask API for real-time predictions.

TensorFlowCNNFlask
Jupyter Notebook • Updated Sep 2024
View Repository →

Contact

Let’s Connect

I’m actively interested in opportunities across data engineering, backend systems, AI infrastructure, and real-time analytics.