Available for opportunities

Aman Kumar
Sahu

|

Building intelligent data platforms that power next-gen AI applications. MS Data Science @ UMD (3.73 GPA) · 2+ years experience · $1.2M+ in cost savings · 3× Hackathon Winner

View Projects Download Resume

Years Experience

3.73

MS GPA

Hackathons Won

$1.2M+

Cost Savings

50+

Pipelines Built

1B+

Records Processed

About Me

Passionate Data Engineer building scalable data pipelines and intelligent AI systems

Who I Am

I'm a passionate Data Engineer and AI/ML Specialist with a Master's degree in Data Science from the University of Maryland (3.73 GPA) and over 2 years of hands-on experience building scalable data pipelines and intelligent systems.

My expertise spans the entire data engineering lifecycle—from architecting robust ETL pipelines processing billions of records to implementing cutting-edge machine learning solutions. I've delivered measurable impact including $1.2M+ in cost savings and maintained 99.9% system uptime.

I'm driven by solving complex technical challenges and leveraging AI to create innovative solutions. Whether it's optimizing data infrastructure at scale or building next-generation ML applications, I thrive on pushing the boundaries of what's possible with data and artificial intelligence.

Professional Impact

Built 50+ production Hadoop data pipelines processing 10TB+ daily data with 99.9% uptime at Tata Consultancy Services (PNC Bank)

Education

MS in Data Science from University of Maryland (3.73 GPA). BTech in Mechatronics Engineering with Robotics specialization from SRM Institute (8.91 CGPA)

Achievements

3× Hackathon Winner ($2,500 First Place at AI & Food Insecurity Competition, featured on CBS News)

Experience & Education

My professional journey building enterprise-scale AI/ML systems

LLM/AI Engineer Intern

Connyct

May 2025 – Aug 2025 (Current)

College Park, MD

Architecting AWS-native event-driven pipelines and multi-agent RAG systems

•Architected AWS-native event-driven pipeline (EventBridge, SQS, Step Functions) with Lambda processing achieving 38% P95 latency reduction and 99.5% uptime SLA
•Designed DynamoDB single-table schema with GSI achieving 72% duplicate elimination, saving 40+ hrs/month and 65% cost reduction
•Migrated scrapers to auto-scaling EC2 with CloudWatch monitoring increasing success rate from 78% to 96% with 3x throughput
•Pioneered multi-agent RAG chatbot using LangChain, Pinecone vector DB, and GPT-4 achieving 25% accuracy boost and <2s response time

AWSLambdaDynamoDBLangChainPineconeGPT-4FastAPI

Data Engineer

Tata Consultancy Services (PNC Bank)

Aug 2022 – Sept 2024 (2 years)

Pittsburgh, PA

Built enterprise-scale data pipelines and MLOps infrastructure for banking systems

•Built 50+ production Hadoop data pipelines using PySpark processing 10TB+ daily data with 99.9% uptime and full regulatory compliance
•Automated 100+ ETL workflows using Airflow DAGs achieving 70% manual effort reduction and 85% faster incident response
•Optimized PySpark jobs with broadcast joins and Jenkins CI/CD achieving 40% performance improvement and $50K+ cost savings
•Embedded end-to-end data lineage tracking with Apache Atlas achieving 35% faster reporting cycles with 100% audit trail
•Mentored 4 junior engineers achieving 50% faster onboarding

SparkHadoopHiveKafkaAirflowPythonJenkinsDockerPySpark

Master of Science in Data Science

University of Maryland, College Park

Aug 2024 – May 2026 (Expected)

College Park, MD

GPA: 3.73/4.0

•Advanced Machine Learning & Deep Learning
•Big Data Systems & Cloud Computing Architecture
•Natural Language Processing & MLOps
•Data Engineering at Scale & Model Deployment

Bachelor of Technology in Mechatronics Engineering (Robotics Specialization)

SRM Institute of Science and Technology

2018 – 2022

Kattankulathur, Chennai, India

CGPA: 8.91/10.0

•Robotics & Control: Theory, Practice, Applied Robotics
•AI for Robotics & Vision, Computer Vision & Advanced CV
•Embedded Systems: Raspberry Pi, Microcontrollers, Digital Systems
•Machine Learning, Numerical Methods, Programming for Problem Solving

Featured Projects

Production-ready AI/ML systems across Technology, Banking, and Retail domains

Technology

Enterprise Multi-Tenant AI Platform

Production-grade SaaS platform with Kubernetes orchestration, tenant isolation via PostgreSQL schemas, real-time Kafka event streaming (100K+ events/sec), ML-driven resource optimization ($100K annual savings), and GPT-4 powered support assistant (40% ticket deflection). Features auto-scaling microservices (FastAPI), Snowflake analytics, Apache Flink real-time processing, MLflow model tracking, and comprehensive observability with Grafana. Reduces tenant onboarding from 4 hours to 10 minutes through automated provisioning.

Impact:100K+ events/sec, 4hrs → 10min onboarding, $100K savings

KubernetesKafkaSnowflakePostgreSQL+9 more

Code Demo

Technology

Intelligent MLOps Platform

End-to-end ML infrastructure reducing model deployment from 3 weeks to 2 days (90% improvement) supporting 40+ production models. Built with Kubeflow Pipelines for orchestration, MLflow experiment tracking, Feast feature store for real-time serving, TensorFlow Serving for inference, and GPT-4 powered code generation (85% accuracy). Features automated CI/CD with A/B testing, drift detection, model versioning, and comprehensive monitoring via Prometheus/Grafana. Includes LLM-assisted debugging and training cost optimization saving $80K annually.

Impact:3 weeks → 2 days deployment, 40+ models, $80K savings

KubeflowMLflowFeastSageMaker+9 more

Code Demo

Technology

Real-Time Observability Intelligence Platform

High-throughput monitoring platform processing 500K+ events/minute with <1s latency, reducing infrastructure costs by 65% vs Datadog ($180K → $63K annually). Built on ClickHouse for hot data storage, Vector log aggregator for ingestion, Kafka buffering, and Apache Flink stream processing. Features LSTM-based anomaly detection (92% precision), GPT-4 powered root cause analysis (<30s), automated log summarization via RAG, and LangChain alert explanation. Includes pattern mining, forecasting engine, custom Grafana dashboards, and PagerDuty integration. Reduces MTTR by 40%.

Impact:500K/min, <1s latency, 65% cost savings, 40% faster MTTR

ClickHouseVectorKafkaApache Flink+8 more

Code Demo

Banking

Real-Time Fraud Detection System

Advanced fraud detection system combining graph neural networks with ensemble machine learning processing 10M+ daily transactions with <100ms latency. Built on Neo4j graph database modeling entity relationships (accounts, merchants, devices) with GNN detecting suspicious patterns and community structures. Ensemble architecture combines XGBoost gradient boosting, Random Forest, and LightGBM with GPT-4 analyzing transaction narratives for anomaly detection. Real-time Kafka streaming with Apache Flink CEP (Complex Event Processing) for pattern matching and rule engines. Redis-backed feature store serving 200+ engineered features with sub-10ms lookup. ML-powered alert prioritization reducing false positives by 78% and analyst workload by 60%. Adaptive models retrained daily on labeled fraud cases achieving 95% precision, 89% recall, preventing $2M+ annual fraud losses.

Impact:$2M+ fraud prevented, 95% precision

KafkaFlinkXGBoostNeo4j+2 more

Code Demo

Banking

Responsible AI Credit Platform

EU AI Act compliant credit scoring platform serving 15K+ customers with full explainability and fairness guarantees achieving <3% default rate. Core LightGBM model trained on 80+ features (credit history, income, debt ratios, behavioral patterns) with SHAP TreeExplainer generating individual-level and global feature importance explanations for regulatory compliance. Fairlearn integration enforces demographic parity constraints across protected attributes (gender, ethnicity, age) ensuring equitable outcomes. GPT-4 powered natural language explanation system translating SHAP values into customer-friendly justifications. Automated fairness auditing pipeline detecting bias drift with Aequitas toolkit, generating compliance reports for EU AI Act Article 13-15 requirements. Aurora PostgreSQL storing audit trails with complete model lineage and decision provenance. Real-time scoring API with 250ms P95 latency, A/B testing framework for champion/challenger models, and automated retraining triggering on AUC degradation >2%. Reduces manual review time by 70% while maintaining strict fairness standards.

Impact:15K+ customers, <3% default

LightGBMSHAPFairlearnGPT-4+1 more

Code Demo

Banking

ML Transaction Reconciliation

Intelligent transaction reconciliation system processing 50M+ daily transactions across multiple payment rails (ACH, wire, card networks) achieving 99.2% automated match rate. Three-tier matching architecture: Level 1 exact matching on transaction IDs and amounts (85% match), Level 2 fuzzy matching using Levenshtein distance and phonetic algorithms for name/reference variations (12% match), Level 3 ML-based matching with XGBoost trained on 150+ engineered features including temporal patterns, amount clustering, merchant fingerprints (2% match). Apache Flink streaming processes real-time transaction feeds from Kafka topics with stateful windowing aggregations and CEP for pattern detection. GPT-4 powered exception handler analyzes remaining 0.8% unmatched cases, reasoning about data quality issues, missing information, and potential fraud, generating natural language explanations for manual review. Aurora PostgreSQL storing transaction states with optimistic locking for concurrent reconciliation workflows. Automated break analysis identifying systematic issues (missing feeds, format changes, timing shifts) with proactive alerting. Reduces reconciliation cycle from T+3 to T+0 (same-day), eliminates 95% manual effort, prevents $5M+ annual revenue leakage from unreconciled transactions.

Impact:50M+ txns/day, 99.2% auto-match

FlinkXGBoostGPT-4Aurora+1 more

Code Demo

Retail

Intelligent Customer 360 Platform

Unified customer data platform consolidating 50M+ profiles from 15+ data sources (web, mobile, POS, call center, email) achieving 85% identity resolution accuracy with real-time personalization. Probabilistic entity resolution using XGBoost trained on fuzzy matching features (name similarity, email patterns, address proximity, phone variations, device fingerprints) linking fragmented customer records across channels. Kafka streaming ingests 5M+ daily events with Apache Flink enrichment pipeline joining behavioral data (browsing, purchases, support tickets) in real-time. Snowflake data warehouse storing complete customer journey history with type-2 slowly changing dimensions for temporal analysis. Redis-backed profile cache serving unified customer views with <50ms latency to downstream systems (CRM, marketing automation, personalization engines). GPT-4 powered behavioral insights generating natural language customer summaries. ML-driven segmentation using K-means and RFM analysis, propensity scoring for cross-sell/upsell opportunities, and next-best-action recommendations. Increases marketing ROI by 45%, reduces customer service handle time by 30%, and improves personalization relevance by 62%.

Impact:50M+ profiles, 85% accuracy

KafkaFlinkXGBoostGPT-4+2 more

Code Demo

Retail

AI Supply Chain Optimization

End-to-end supply chain optimization platform combining demand forecasting with inventory planning reducing stockouts by 62% and saving $1.5M annually. Hybrid forecasting ensemble blending Facebook Prophet (capturing seasonality, holidays, trends) with LSTM neural networks (learning complex non-linear patterns) across 50K+ SKUs and 200+ store locations. Features engineered from historical sales, promotions, weather data, local events, and competitor pricing with external data enrichment via APIs. Google OR-Tools constraint optimization solving multi-echelon inventory allocation problem balancing service levels (98% target), working capital constraints ($50M limit), and warehouse capacity (500K units). Kafka streaming real-time sales data triggering dynamic reforecasting when actuals deviate >15% from predictions. GPT-4 powered root cause analysis explaining forecast errors with automated alert generation. Multi-objective optimization considering trade-offs: minimize stockouts vs holding costs vs expedited shipping. Simulation engine testing what-if scenarios for promotional events, supply disruptions, and demand shocks. Reduces excess inventory by 35%, improves forecast accuracy from MAPE 28% → 12%, and increases inventory turnover ratio by 40%.

Impact:62% stockout reduction, $1.5M savings

ProphetLSTMOR-ToolsGPT-4+1 more

Code Demo

Retail

Dynamic Pricing & Optimization

Reinforcement learning pricing engine processing 10K+ pricing decisions hourly across 25K+ products increasing revenue by 12% ($3M annually) while maintaining brand positioning. Q-Learning agent trained on 2+ years historical data learning optimal pricing strategies balancing revenue maximization, inventory clearance, and competitive positioning. State space captures 80+ features: demand elasticity, competitor prices, inventory levels, seasonality, customer segments, and margin constraints. XGBoost surrogate model predicting demand response curves for fast policy evaluation during online serving. Scrapy-based competitive intelligence platform monitoring 50+ competitor websites hourly, extracting prices, promotions, stock availability with GPT-4 NLP analyzing promotional language and value propositions. Real-time pricing API with Redis caching serving personalized prices based on customer segment, browsing history, and cart abandonment propensity. Multi-armed bandit testing for exploration-exploitation trade-off avoiding local optima. Constraint satisfaction ensuring prices respect MAP (Minimum Advertised Price) agreements, margin floors (20% minimum), and psychological pricing rules (ending in .99). A/B testing framework measuring causal impact with difference-in-differences methodology. Reduces manual pricing effort by 90%, improves price competitiveness index by 25%, and increases conversion rate by 8%.

Impact:+12% revenue ($3M), 10K+ decisions/hr

XGBoostQ-LearningGPT-4Scrapy+1 more

Code Demo

Hackathon Wins

3× Winner with $2,500+ in prizes building impactful AI solutions

AI & Food Insecurity Case Competition

$2,500 First Place

April 2025

University of Maryland × Capital Area Food Bank

Team TERPSTER (3 members: Aman, Dhanush, Supriya)

Featured on CBS News & UMD AI Media Day

Voice-First Multilingual AI Platform

Built an AI assistant supporting multiple languages, democratizing tech access for 56%+ non-English speakers. Featured on CBS News and showcased at UMD AI Media Day.

Achievement

Breaking barriers for 56%+ underserved populations

TwilioAWS LambdaLLMsGoogle Maps APIKafka

ServiceNow Knowledge Gap Challenge

$700 + Sony WH-1000XM4 Headphones (Winner)

October 2025

Knight Hacks VIII

4 members (642 participants)

Winner among 642 participants

Synapse: Multi-Agent AI Collaboration System

Neural network of specialized AI agents working together to solve complex problems through intelligent collaboration and orchestration.

Achievement

Neural network of specialized AI agents

ServiceNowMulti-Agent SystemsLLMsAI Orchestration

T. Rowe Price Investor Education Challenge

Portable Monitor (Winner)

November 2025

Technica 2025

2 members (Aman + Supriya, 434 participants)

Winner among 434 participants

GenAI Financial Literacy Platform

AI-powered financial education platform with personalized learning paths. Addressed 64% financial illiteracy gap among young adults.

Achievement

Addressing 64% financial illiteracy gap

GenAIFinancial EducationAccessibilityLLMs

Technical Arsenal

Mastery across 100+ cutting-edge technologies powering enterprise AI and data platforms

Core Languages

Python

SQL

Scala

Java

Bash

AI & Machine Learning

TensorFlow

PyTorch

Scikit-learn

OpenAI

Hugging Face

XGBoost

LightGBM

LangChain

LlamaIndex

Pandas

NumPy

SHAP

Fairlearn

Prophet

LSTM

MLOps & Experimentation

MLflow

Weights & Biases

FEAST

DVC

Feature Store

Kubeflow

SageMaker

Vertex AI

Big Data & Streaming

Apache Spark

Apache Kafka

Apache Flink

Hadoop

Hive

Dask

Databricks

PySpark

Workflow Orchestration

Airflow

Prefect

Dagster

dbt

Step Functions

Data Formats & Lakes

Parquet

Avro

Delta Lake

Iceberg

Cloud Platforms

GCP

Docker

Kubernetes

Terraform

AWS

Lambda

EMR

Glue

Redshift

Athena

Kinesis

BigQuery

Databases

PostgreSQL

MySQL

MongoDB

Redis

Neo4j

Snowflake

ClickHouse

Pinecone

Weaviate

Milvus

DevOps & CI/CD

Git

GitHub

GitHub Actions

Jenkins

GitLab

Grafana

Prometheus

Datadog

Elasticsearch

EXTENDED TOOLKIT

Additional frameworks, libraries, and tools in active use

MLflowWeaviateDatadoggRPCFastAPIFlaskGraphQLTwilioScrapySeleniumCeleryRabbitMQSupersetTableau

Get In Touch

Open to opportunities in Data Engineering, MLOps, and AI/ML Platform roles

Let's Build Something Amazing

Whether you have a project in mind, want to discuss opportunities, or just want to connect, I'd love to hear from you. Feel free to reach out through any of the channels below.

Emailaman.sahu205@gmail.com

LinkedInaman205

GitHubamansahu205

LocationCollege Park, MD

Send Email Download Resume

Aman KumarSahu

|

About Me

Who I Am

Professional Impact

Education

Achievements

Experience & Education

LLM/AI Engineer Intern

Data Engineer

Master of Science in Data Science

Bachelor of Technology in Mechatronics Engineering (Robotics Specialization)

Featured Projects

Enterprise Multi-Tenant AI Platform

Intelligent MLOps Platform

Real-Time Observability Intelligence Platform

Real-Time Fraud Detection System

Responsible AI Credit Platform

ML Transaction Reconciliation

Intelligent Customer 360 Platform

AI Supply Chain Optimization

Dynamic Pricing & Optimization

Hackathon Wins

AI & Food Insecurity Case Competition

Voice-First Multilingual AI Platform

ServiceNow Knowledge Gap Challenge

Synapse: Multi-Agent AI Collaboration System

T. Rowe Price Investor Education Challenge

GenAI Financial Literacy Platform

Technical Arsenal

Core Languages

AI & Machine Learning

MLOps & Experimentation

Big Data & Streaming

Workflow Orchestration

Data Formats & Lakes

Cloud Platforms

Databases

DevOps & CI/CD

Get In Touch

Let's Build Something Amazing

Aman Kumar
Sahu