Work Experience
Data Engineer, Data Mesh Implementation for Healthcare Data
Duration: 08.2021 - till now
Summary:- A data-driven startup leveraging the concept of data mesh to process and transform massive healthcare data
- Promoting data democratization within the organization
Responsibilities: Implement secure storage solutions, collect and preprocess data for ML models, develop data integrity checks, configure Kafka brokers, develop data breach response procedures, establish audit trail mechanisms, create data conduits with Kafka, configure AWS DMS, optimize Kafka configurations, enforce data integrity constraints, optimize SQL queries and workflows, integrate Amazon MQ, implement security controls, develop logical data models, integrate Spark with AWS Glue, monitor data quality, collaborate with data scientists, architect data streaming pipelines, maintain Tableau dashboards, configure Pub/Sub access controls, set up monitoring and logging for pipelines, utilize CDKTF for Terraform configurations, create Bash scripts, engage in GitHub Actions YAML file configuration, design CI/CD pipelines, code review.
Technologies: Python, SQL, Apache Spark, PySpark, Apache Airflow, AWS, Kafka, Redis, Oracle, Pandas, NumPy, Tableau, Bash scripting, CI/CD, Docker, Docker Compose, Kubernetes, GitHub Actions, GitHub
Data Engineer, Financial Data Management and Analytics Platform
Duration: 07.2019 - 07.2021
Summary:- Revolutionizing financial data management and analysis with a unified platform that brings together disparate data sources and facilitates advanced analytics
- A combination of traditional DWH and Data Lake features for overcoming data silos, incomplete insights, and inefficient analysis methods
Responsibilities: Implement data governance policies, patch and update GCP Cloud SQL databases, design real-time streaming data pipelines with Apache Spark Streaming, normalize data models, identify data quality issues, migrate data with GCP Dataflow, build DWH on GCP BigQuery, integrate data with Google Cloud BigTable, manage workflows with Airflow, monitor Tableau dashboards, fine-tune ML models with Scikit-learn, prepare datasets with LookML, create dashboards with Looker, maintain metadata repositories, clean data with GCP Dataprep, orchestrate workflows with GCP Composer, transfer data with GCP Transfer Service, develop CI/CD with Jenkins, code review and refactoring.
Technologies: Python, SQL, Apache Airflow, Apache Spark, PySpark, GCP, Redis, MongoDB, PostgreSQL, Pandas, NumPy, Tableau, Matplotlib, Scikit-learn, Jenkins, CI/CD, Bash scripting, Docker, Docker Compose, GitHub
Data Engineer, Sales Analysis and Business Performance Improvement
Duration: 12.2018 - 07.2019
Summary:- Sales analysis project to improve business performance through insights into customer behavior by utilizing data visualization and statistical analysis to identify trends and opportunities for growth
- Optimized pricing, promotions, and marketing campaigns
Responsibilities: Develop AWS Lambda for logic execution, maintain Kafka configurations, manage Amazon RDS databases, process large-scale data with AWS Databricks, monitor Kafka clusters, implement data ingestion pipelines, integrate Spark Streaming, perform data analysis with Databricks notebooks, design ETL pipelines, monitor Lambda functions, implement data models in Power BI, provision EC2 instances, analyze data with Apache Hive, follow DWH concepts, code review.
Technologies: Python, SQL, Kafka, Apache Spark, PySpark, Apache Hadoop, AWS, MS SQL, Redis, MongoDB, Power BI, Pandas, NumPy, Bash scripting, CI/CD, Docker, Docker Compose, Kubernetes, Bitbucket
Data Engineer, E-commerce Platform for Healthy Lifestyle Products
Duration: 01.2018 - 12.2018
Summary: E-commerce platform for a healthy lifestyle online store, providing product selection, ordering, delivery, and advice in sports and nutrition, with a focus on convenience and customer engagement.
Responsibilities: Provide cloud solutions with AWS, resolve RabbitMQ issues, secure Spark Streaming applications, automate data processes in Power BI, develop RabbitMQ messaging components, configure EC2 performance, orchestrate containers with Kubernetes, implement AWS Lambda serverless parts, write SQL queries, monitor RabbitMQ clusters, implement data backup and recovery strategies, troubleshoot Apache Spark, assist with data modeling, optimize SQL queries.
Technologies: Python, SQL, RabbitMQ, Apache Spark, PySpark, AWS, MongoDB, PostgreSQL, Power BI, Pandas, NumPy, Docker, Docker Compose, Kubernetes, CI/CD, Bash scripting, GitLab
Education
- Computer Science and Software Engineering
Certification
- Google Cloud Certified – Professional Data Engineer
2023