Currently @ Lam Research • IIT Madras '25

Hi, I'm Shreyan

Data Engineer & AI Enthusiast

@ Lam Research via via KPI Partners

I turn messy data into production-grade pipelines. Currently processing semiconductor data at scale using Azure, Databricks & Spark.

About Me

Data Engineer with a passion for building scalable solutions

I'm a Data Science undergraduate at IIT Madras with hands-on experience building production data pipelines. Currently working as a Data Engineer at Lam Research via KPI Partners, I've gained deep expertise in Azure Data Factory, Databricks, Apache Spark, Delta Lake, and DBT — processing terabytes of semiconductor manufacturing data daily.

I'm a fast learner with a solid technical foundation, eager to apply analytical thinking and engineering principles to solve meaningful data-driven problems. With strong communication and mentoring abilities, I thrive in collaborative environments and am passionate about leveraging AI and ML to create innovative solutions.

What I'm Looking For

Roles

Data EngineerAnalytics EngineerML Engineer

Preferences

Full-timeOpen to relocationRemote-friendly
Currently employed, open to connect
2TB+
Data Processed Daily
15+
Pipelines Built
25+
Technologies
6
Certifications

Currently Learning

Azure Data Engineer Associate Certificationdbt for Analytics EngineeringData Mesh ArchitectureAdvanced Airflow Patterns

Work Experience

Building data infrastructure that scales — from pipelines to insights

Data Engineer Intern

KPI Partners / Lam Research

Nov 2025 - Present
Remote

Building and optimizing enterprise-scale data pipelines for semiconductor manufacturing analytics at Lam Research

2TB+
Daily Data Processed
15+
Pipeline Count
85%
Processing Time Reduction

Key Achievements

  • Architected 15+ data pipelines processing 2TB+ daily using Azure Data Factory and Databricks, reducing data latency by 40%
  • Optimized PySpark jobs handling 500M+ records, cutting processing time from 4 hours to 45 minutes through partition tuning
  • Implemented Delta Lake ACID transactions ensuring 99.9% data reliability across 50+ production tables
  • Built reusable DBT models that standardized transformations across 8 data domains, reducing development time by 60%
  • Collaborated with cross-functional teams to translate business requirements into scalable data solutions
Azure Data FactoryAzure DatabricksApache SparkPySparkDelta LakeDBTSQLPython

Business Data Management Project Mentor

IIT Madras

2023 - 2024
Chennai, India

Guided 100+ students through complete Business Data Management projects, achieving 100% project completion rate

100+
Students Mentored
100%
Completion Rate
3
Batches Impacted

Key Achievements

  • Mentored 100+ students through end-to-end data projects, from collection to visualization, achieving 100% completion rate
  • Developed standardized SQL and Excel templates that reduced student onboarding time by 50%
  • Created comprehensive documentation that became the reference guide for 3 subsequent batches
  • Translated complex business requirements into actionable data workflows for non-technical stakeholders
SQLData AnalysisExcelDatabase DesignMentoringTechnical Writing

Subject Matter Expert - Statistics

Chegg

2022 - 2023
Remote

Provided expert solutions for complex statistics problems, maintaining 4.8/5 average rating across 200+ solutions

200+
Problems Solved
4.8/5
Avg Rating
95%
Acceptance Rate

Key Achievements

  • Solved 200+ complex statistics problems with detailed explanations, maintaining 4.8/5 average rating
  • Achieved 95% first-attempt acceptance rate through rigorous quality standards
  • Specialized in regression analysis, hypothesis testing, and probability distributions
StatisticsProblem SolvingCommunicationTeachingTechnical Writing

Technical Skills

From data ingestion to ML deployment — here's my end-to-end toolkit

<My Data Pipeline Stack/>

Click on any node to see the technologies I use

Languages

PythonSQL (MySQL)CC++JavaScriptHTML/CSS

Data Engineering

Azure Data FactoryAzure DatabricksApache Spark (PySpark)Delta LakeDBTSQLAlchemyPsycopg2

Machine Learning & AI

Scikit-learnPyTorchNumPyPandasMatplotlibSeabornXGBoost
🧠

LLMs & AI Agents

RAG PipelinesLLM Integration (Gemma, Ollama)Embedding ModelsVector Databases (FAISS, ChromaDB)Query Embedding & Similarity Search

Developer Tools

GitGitHubVS CodePyCharmJupyterPostman

Web Development

FlaskREST APIsSQLiteJinja2

Featured Projects

From AI-powered applications to production ML pipelines

AI-Enhanced Learning Portal

AI & Full Stack

Designed and implemented AI-driven features for the IIT Madras learning portal using LLMs and RAG pipelines

Tech Stack

PythonFlaskOllama/Gemma LLMsRAGEmbedding Models+2 more
  • Integrated LLMs (Ollama) for lecture summaries, student queries, and context-aware explanations
  • Built RAG-style workflows using embedding models and chunked lecture content
  • Developed AI-driven assignment assistance with subjective answer evaluation
  • Created structured AI endpoints and prompt pipelines for smooth integration

Library Management System

Full Stack Development

End-to-end web-based Library Management System with role-based access control and automated workflows

Tech Stack

PythonFlaskSQLiteSQLAlchemyHTML+2 more
  • Implemented role-based access control (RBAC) for admins and users
  • Developed extensive book and user management module with real-time updates
  • Built automated borrow-return workflow with notification reminders
  • Created clean and functional interface for seamless navigation

Bank Telemarketing Success Prediction

Machine Learning
Top 12% on Kaggle

Machine learning model to predict customer subscription to term deposits. Achieved f1-score of 0.76908 and rank 149/1256 on leaderboard

Tech Stack

PythonPandasNumPyScikit-learnMatplotlib+3 more
  • Performed comprehensive data cleaning, preprocessing, and exploratory analysis
  • Engineered meaningful features and derived variables to improve accuracy
  • Trained and evaluated multiple models (Logistic Regression, Random Forest, XGBoost)
  • Designed structured ML workflow with robust validation and iterative tuning
Private Repository

Business Data Management - WriteWing Advertising

Data Analysis

Comprehensive data analysis project analyzing customer, marketing, and website performance data for business insights

Tech Stack

PythonPandasNumPyMatplotlibSeaborn+2 more
  • Performed end-to-end data preprocessing and exploratory analysis
  • Conducted cluster analysis and client segmentation for targeted strategies
  • Analyzed marketing campaign performance and ROI
  • Evaluated website traffic behavior and conversion optimization
Private Repository

💡 Some projects are private. Code samples available upon request.

Education

Building a strong foundation in Data Science and Computer Systems

BS in Data Science and Applications

Minor in Computer Systems

Indian Institute of Technology Madras

📍 Chennai, Tamil Nadu • 📅 2021 - 2025

Pursuing a comprehensive degree in Data Science with a minor in Computer Systems, focusing on machine learning, data engineering, and AI applications

12th Grade

South Point High School

📍 Kolkata, West Bengal • 📅 2020

Score: 95.00%

Certifications

Continuous learning and skill validation

Databricks Certified Data Engineer Associate

Databricks

Validated expertise in data engineering using Databricks, including ETL pipelines, Delta Lake, Spark, and data lakehouse architecture

DatabricksDelta LakeApache SparkETLData Lakehouse
Verify Credential

Apache Airflow 3 Certified

Astronomer

Demonstrated proficiency in workflow orchestration, DAG development, and pipeline scheduling using Apache Airflow

Apache AirflowWorkflow OrchestrationDAGsPipeline Scheduling
Verify Credential

AWS Academy Graduate – Cloud Foundations

AWS Academy

Gained strong understanding of cloud computing fundamentals, AWS global infrastructure, and core services (EC2, S3, RDS, IAM)

AWSCloud ComputingEC2S3RDSIAM
Verify Credential

Data Mining

NPTEL

Acquired knowledge of clustering, classification, and association rule learning for extracting patterns from large datasets

Data MiningClusteringClassification

Introduction to Programming Using Python

HackerRank

SQL (Basic)

HackerRank

Get In Touch

I'm always open to discussing new opportunities, interesting projects, or just having a chat about data engineering.

Quick Info

India
Currently employed, open to connect

Open to: Data Engineer, Analytics Engineer, ML Engineer