Work Experience
Senior Data Engineer - CVS Health (Healthcare Data Lakehouse and AI-Ready Pipelines)
Duration: Jun 2025 – Present
Summary:- Development and architecture of scalable batch and near-real-time data pipelines processing multi-terabyte healthcare datasets daily
- The project supports clinical risk modeling, operational forecasting, and AI retrieval workflows by delivering ML-ready and embedding-ready datasets
- It includes integration of ML workflows and deployment automation across cloud environments
Responsibilities:- Architected scalable data pipelines using PySpark, Databricks, Delta Live Tables, and Kafka.
- Built AI-ready feature engineering pipelines for ML training and inference.
- Designed and optimized Snowflake pipelines with Snowpark, Streams & Tasks.
- Implemented ingestion pipelines for structured and semi-structured healthcare data supporting AI retrieval workflows.
- Developed embedding-ready datasets for AI experimentation.
- Integrated ML workflows with MLflow, Databricks Feature Store, and model versioning.
- Designed REST-based ingestion services using FastAPI.
- Orchestrated pipelines using Apache Airflow and Terraform for deployment automation on AWS and Azure.
- Implemented dbt transformation layers for modular data contracts and semantic models.
- Built DataOps validation frameworks for schema drift detection, anomaly checks, and observability monitoring.
- Enforced HIPAA-compliant governance with RBAC policies and audit-ready lineage.
- Optimized compute costs via Spark resource tuning and cluster auto-scaling.
Technologies: Databricks, Delta Lake, Snowflake (Snowpark, Streams, Tasks), PySpark, Kafka, dbt, MLflow, FastAPI, AWS (S3, Glue, EMR), Azure Databricks, Terraform, Docker, Kubernetes
Data Engineer – ML & Analytics - Dr. Reddy’s Laboratories (Supply Chain and Forecasting Data Pipelines)
Duration: Mar 2021 – Jul 2023
Summary:- Designed and implemented Spark-based ELT pipelines supporting enterprise analytics and machine learning initiatives across supply chain and forecasting systems
- Migrated on-premises workflows to AWS S3 Data Lake architecture to improve scalability and reduce costs
- Developed AI-ready datasets and optimized Snowflake performance for secure data sharing and efficient transformations
Responsibilities:- Designed Spark-based ELT pipelines for analytics and ML initiatives.
- Migrated on-prem workflows to AWS S3 Data Lake architecture.
- Built scalable feature engineering pipelines for ML training.
- Designed curated AI-ready datasets using dimensional modeling and Snowflake transformations.
- Implemented Snowflake performance tuning, partition optimization, and secure data sharing.
- Developed data ingestion workflows from REST APIs and third-party sources.
- Integrated batch and streaming ingestion using Kafka to reduce reporting latency.
- Implemented data validation pipelines with profiling and reconciliation logic.
- Orchestrated workflows with Apache Airflow maintaining high data freshness SLAs.
- Collaborated with data scientists on feature refresh schedules and schema evolution for model retraining.
Technologies: AWS (S3, EMR, Lambda), Apache Spark, PySpark, Snowflake, Airflow, PostgreSQL, Tableau, Python, Kafka
Data Engineer - Hexaware Technologies (Financial Data ETL and Fraud Analytics Support)
Duration: Mar 2019 – Feb 2021
Summary:- Developed SQL-based ETL pipelines ingesting financial datasets for risk and fraud analytics in regulatory environments
- Automated data cleansing workflows and designed relational data models to support reporting and risk analysis
- Delivered curated datasets for predictive risk scoring and built dashboards tracking operational KPIs and compliance metrics
Responsibilities:- Developed SQL-based ETL pipelines for financial data ingestion.
- Automated data cleansing workflows using Python.
- Designed relational data models in Oracle and PostgreSQL.
- Supported fraud analytics teams with curated datasets.
- Implemented validation and reconciliation checks for data accuracy.
- Optimized SQL queries and ETL jobs to improve reporting performance.
- Built Power BI dashboards for operational KPIs and compliance metrics.
Technologies: SQL, Python, Oracle, PostgreSQL, Power BI
Education
- Master of Science in Computer Science
University of Bridgeport — Bridgeport, Connecticut, USA
Sept 2023 - May 2025