CHAITANYA PRIYA SURUKONTI — Senior Data Engineer

Expertise in Data Engineer, AI and Machine Learning.

Last verified on February, 2026

Core Skills

Apache Spark

5 yr.

SQL

5 yr.

Databricks

3 yr.

Bio Summary

Senior Data Engineer with 5+ years designing scalable lakehouse architectures and distributed data pipelines using Databricks, Snowflake (Snowpark), PySpark, Kafka, and Airflow across healthcare, life sciences, and finance domains.
Expertise in building AI/ML-ready feature engineering pipelines, embedding datasets, and integrating ML workflows with MLflow and Databricks Feature Store for clinical risk modeling and forecasting.
Proficient in cloud platforms AWS, Azure, and GCP, implementing CI/CD, Terraform IaC, and DataOps frameworks ensuring HIPAA-compliant governance and 99.5% SLA data freshness.
Strong background in performance optimization, including Snowflake query tuning (28% improvement), Spark resource tuning (18% cost reduction), and streaming ingestion reducing latency by 45%.
Master of Science in Computer Science with hands-on experience in REST API development (FastAPI), containerization (Docker, Kubernetes), and modular data contracts using dbt, enabling robust, scalable data engineering solutions.

Technical Skills

Programming Languages	Python
Java Frameworks	Apache Spark
Scala Frameworks	Apache Spark
Python Frameworks	FastAPI
AI & Machine Learning	Mlflow, Vertex AI
Python Libraries and Tools	PySpark
Data Analysis and Visualization Technologies	Apache Airflow, Apache Spark, Apache Spark Streaming, Databricks, Looker Studio, Power BI, Tableau
Databases & Management Systems / ORM	Apache Spark, Apache Spark Streaming, AWS Redshift, dbt, Oracle Database, PostgreSQL, Snowflake, SQL
Cloud Platforms, Services & Computing	AWS, GCP
Amazon Web Services	AWS Lambda, AWS Redshift
Azure Cloud Services	Databricks
Google Cloud Platform	Google BigQuery
UI/UX/Wireframing	3D Modelling
Deployment, CI/CD & Administration	CI/CD
QA, Test Automation, Security	Data Validation
Virtualization, Containers and Orchestration	Docker, Kubernetes, Terraform
SDK / API and Integrations	FastAPI, RESTful API
Message/Queue/Task Brokers	Kafka
Methodologies, Paradigms and Patterns	Publish/Subscribe Architectural Pattern
Other Technical Skills	DataOps, Delta lake, Snowpark API, Spark EMR

Work Experience

Senior Data Engineer - CVS Health (Healthcare Data Lakehouse and AI-Ready Pipelines)

Duration: Jun 2025 – Present
Summary:

Development and architecture of scalable batch and near-real-time data pipelines processing multi-terabyte healthcare datasets daily
The project supports clinical risk modeling, operational forecasting, and AI retrieval workflows by delivering ML-ready and embedding-ready datasets
It includes integration of ML workflows and deployment automation across cloud environments

Responsibilities:

Architected scalable data pipelines using PySpark, Databricks, Delta Live Tables, and Kafka.
Built AI-ready feature engineering pipelines for ML training and inference.
Designed and optimized Snowflake pipelines with Snowpark, Streams & Tasks.
Implemented ingestion pipelines for structured and semi-structured healthcare data supporting AI retrieval workflows.
Developed embedding-ready datasets for AI experimentation.
Integrated ML workflows with MLflow, Databricks Feature Store, and model versioning.
Designed REST-based ingestion services using FastAPI.
Orchestrated pipelines using Apache Airflow and Terraform for deployment automation on AWS and Azure.
Implemented dbt transformation layers for modular data contracts and semantic models.
Built DataOps validation frameworks for schema drift detection, anomaly checks, and observability monitoring.
Enforced HIPAA-compliant governance with RBAC policies and audit-ready lineage.
Optimized compute costs via Spark resource tuning and cluster auto-scaling.

Technologies: Databricks, Delta Lake, Snowflake (Snowpark, Streams, Tasks), PySpark, Kafka, dbt, MLflow, FastAPI, AWS (S3, Glue, EMR), Azure Databricks, Terraform, Docker, Kubernetes

Data Engineer – ML & Analytics - Dr. Reddy’s Laboratories (Supply Chain and Forecasting Data Pipelines)

Duration: Mar 2021 – Jul 2023
Summary:

Designed and implemented Spark-based ELT pipelines supporting enterprise analytics and machine learning initiatives across supply chain and forecasting systems
Migrated on-premises workflows to AWS S3 Data Lake architecture to improve scalability and reduce costs
Developed AI-ready datasets and optimized Snowflake performance for secure data sharing and efficient transformations

Responsibilities:

Designed Spark-based ELT pipelines for analytics and ML initiatives.
Migrated on-prem workflows to AWS S3 Data Lake architecture.
Built scalable feature engineering pipelines for ML training.
Designed curated AI-ready datasets using dimensional modeling and Snowflake transformations.
Implemented Snowflake performance tuning, partition optimization, and secure data sharing.
Developed data ingestion workflows from REST APIs and third-party sources.
Integrated batch and streaming ingestion using Kafka to reduce reporting latency.
Implemented data validation pipelines with profiling and reconciliation logic.
Orchestrated workflows with Apache Airflow maintaining high data freshness SLAs.
Collaborated with data scientists on feature refresh schedules and schema evolution for model retraining.

Technologies: AWS (S3, EMR, Lambda), Apache Spark, PySpark, Snowflake, Airflow, PostgreSQL, Tableau, Python, Kafka

Data Engineer - Hexaware Technologies (Financial Data ETL and Fraud Analytics Support)

Duration: Mar 2019 – Feb 2021
Summary:

Developed SQL-based ETL pipelines ingesting financial datasets for risk and fraud analytics in regulatory environments
Automated data cleansing workflows and designed relational data models to support reporting and risk analysis
Delivered curated datasets for predictive risk scoring and built dashboards tracking operational KPIs and compliance metrics

Responsibilities:

Developed SQL-based ETL pipelines for financial data ingestion.
Automated data cleansing workflows using Python.
Designed relational data models in Oracle and PostgreSQL.
Supported fraud analytics teams with curated datasets.
Implemented validation and reconciliation checks for data accuracy.
Optimized SQL queries and ETL jobs to improve reporting performance.
Built Power BI dashboards for operational KPIs and compliance metrics.

Technologies: SQL, Python, Oracle, PostgreSQL, Power BI

Education

Master of Science in Computer Science
University of Bridgeport — Bridgeport, Connecticut, USA
Sept 2023 - May 2025

How to hire with Upstaff

Talk to Our Talent Expert

Our journey starts with a 30-min discovery call to explore your project challenges, technical needs and team diversity.

Meet Carefully Matched Talents

Within 1-3 days, we’ll share profiles and connect you with the right talents for your project. Schedule a call to meet engineers in person.

Validate Your Choice

Bring new talent on board with a trial period to confirm you hire the right one. There are no termination fees or hidden costs.

Why Upstaff

Upstaff is a technology partner with expertise in AI, Web3, Software, and Data. We help businesses gain competitive edge by optimizing existing systems and utilizing modern technology to fuel business growth.

Real-time project team launch

<24h

Interview First Engineers

Upstaff's network enables clients to access specialists within hours & days, streamlining the hiring process to 24-48 hours, start ASAP.

x10

Faster Talent Acquisition

Upstaff's network & platform enables clients to scale up and down blazing fast. Every hire typically is 10x faster comparing to regular recruitement workflow.

Vetted and Trusted Engineers

100%

Security And Vetting-First

AI tools and expert human reviewers in the vetting process is combined with track record & historically collected feedbacks from clients and teammates.

~50h

Save Time For Deep Vetting

In average, we save over 50 hours of client team to interview candidates for each job position. We are fueled by a passion for tech expertise, drawn from our deep understanding of the industry.

Flexible Engagement Models

Custom Engagement Models

Flexible staffing solutions, accommodating both short-term projects and longer-term engagements, full-time & part-time

Unique Talent Ecosystem

Candidate Staffing Platform stores data about past and present candidates, enables fast work and scalability, providing clients with valuable insights into their talent pipeline.

Transparent

No Hidden Costs

Price quoted is the total price to you. No hidden or unexpected cost for for candidate placement.

One Consolidated Invoice

No matter how many engineers you employ, there is only one monthly consolidated invoice.

Ready to hire CHAITANYA PRIYA SURUKONTI
or someone with similar Skills?

Book a call with CHAITANYA PRIYA SURUKONTI

Start Hiring