Hi, I'm Shreyan
Data Engineer & AI Enthusiast
@ Lam Research via via KPI Partners
I turn messy data into production-grade pipelines. Currently processing semiconductor data at scale using Azure, Databricks & Spark.
About Me
Data Engineer with a passion for building scalable solutions
I'm a Data Science undergraduate at IIT Madras with hands-on experience building production data pipelines. Currently working as a Data Engineer at Lam Research via KPI Partners, I've gained deep expertise in Azure Data Factory, Databricks, Apache Spark, Delta Lake, and DBT — processing terabytes of semiconductor manufacturing data daily.
I'm a fast learner with a solid technical foundation, eager to apply analytical thinking and engineering principles to solve meaningful data-driven problems. With strong communication and mentoring abilities, I thrive in collaborative environments and am passionate about leveraging AI and ML to create innovative solutions.
What I'm Looking For
Roles
Preferences
Currently Learning
Work Experience
Building data infrastructure that scales — from pipelines to insights
Data Engineer Intern
KPI Partners / Lam Research
Building and optimizing enterprise-scale data pipelines for semiconductor manufacturing analytics at Lam Research
Key Achievements
- Architected 15+ data pipelines processing 2TB+ daily using Azure Data Factory and Databricks, reducing data latency by 40%
- Optimized PySpark jobs handling 500M+ records, cutting processing time from 4 hours to 45 minutes through partition tuning
- Implemented Delta Lake ACID transactions ensuring 99.9% data reliability across 50+ production tables
- Built reusable DBT models that standardized transformations across 8 data domains, reducing development time by 60%
- Collaborated with cross-functional teams to translate business requirements into scalable data solutions
Business Data Management Project Mentor
IIT Madras
Guided 100+ students through complete Business Data Management projects, achieving 100% project completion rate
Key Achievements
- Mentored 100+ students through end-to-end data projects, from collection to visualization, achieving 100% completion rate
- Developed standardized SQL and Excel templates that reduced student onboarding time by 50%
- Created comprehensive documentation that became the reference guide for 3 subsequent batches
- Translated complex business requirements into actionable data workflows for non-technical stakeholders
Subject Matter Expert - Statistics
Chegg
Provided expert solutions for complex statistics problems, maintaining 4.8/5 average rating across 200+ solutions
Key Achievements
- Solved 200+ complex statistics problems with detailed explanations, maintaining 4.8/5 average rating
- Achieved 95% first-attempt acceptance rate through rigorous quality standards
- Specialized in regression analysis, hypothesis testing, and probability distributions
Technical Skills
From data ingestion to ML deployment — here's my end-to-end toolkit
<My Data Pipeline Stack/>
Click on any node to see the technologies I use
Languages
Data Engineering
Machine Learning & AI
LLMs & AI Agents
Developer Tools
Web Development
Featured Projects
From AI-powered applications to production ML pipelines
AI-Enhanced Learning Portal
AI & Full StackDesigned and implemented AI-driven features for the IIT Madras learning portal using LLMs and RAG pipelines
Tech Stack
- Integrated LLMs (Ollama) for lecture summaries, student queries, and context-aware explanations
- Built RAG-style workflows using embedding models and chunked lecture content
- Developed AI-driven assignment assistance with subjective answer evaluation
- Created structured AI endpoints and prompt pipelines for smooth integration
Library Management System
Full Stack DevelopmentEnd-to-end web-based Library Management System with role-based access control and automated workflows
Tech Stack
- Implemented role-based access control (RBAC) for admins and users
- Developed extensive book and user management module with real-time updates
- Built automated borrow-return workflow with notification reminders
- Created clean and functional interface for seamless navigation
Bank Telemarketing Success Prediction
Machine LearningMachine learning model to predict customer subscription to term deposits. Achieved f1-score of 0.76908 and rank 149/1256 on leaderboard
Tech Stack
- Performed comprehensive data cleaning, preprocessing, and exploratory analysis
- Engineered meaningful features and derived variables to improve accuracy
- Trained and evaluated multiple models (Logistic Regression, Random Forest, XGBoost)
- Designed structured ML workflow with robust validation and iterative tuning
Business Data Management - WriteWing Advertising
Data AnalysisComprehensive data analysis project analyzing customer, marketing, and website performance data for business insights
Tech Stack
- Performed end-to-end data preprocessing and exploratory analysis
- Conducted cluster analysis and client segmentation for targeted strategies
- Analyzed marketing campaign performance and ROI
- Evaluated website traffic behavior and conversion optimization
💡 Some projects are private. Code samples available upon request.
Education
Building a strong foundation in Data Science and Computer Systems
BS in Data Science and Applications
Minor in Computer Systems
Indian Institute of Technology Madras
📍 Chennai, Tamil Nadu • 📅 2021 - 2025
Pursuing a comprehensive degree in Data Science with a minor in Computer Systems, focusing on machine learning, data engineering, and AI applications
12th Grade
South Point High School
📍 Kolkata, West Bengal • 📅 2020
Score: 95.00%
Certifications
Continuous learning and skill validation
Databricks Certified Data Engineer Associate
Databricks
Validated expertise in data engineering using Databricks, including ETL pipelines, Delta Lake, Spark, and data lakehouse architecture
Apache Airflow 3 Certified
Astronomer
Demonstrated proficiency in workflow orchestration, DAG development, and pipeline scheduling using Apache Airflow
AWS Academy Graduate – Cloud Foundations
AWS Academy
Gained strong understanding of cloud computing fundamentals, AWS global infrastructure, and core services (EC2, S3, RDS, IAM)
Data Mining
NPTEL
Acquired knowledge of clustering, classification, and association rule learning for extracting patterns from large datasets
Introduction to Programming Using Python
HackerRank
SQL (Basic)
HackerRank
Get In Touch
I'm always open to discussing new opportunities, interesting projects, or just having a chat about data engineering.
Quick Info
Open to: Data Engineer, Analytics Engineer, ML Engineer