Upload

Ihor K Big Data & Data Science Engineer with BI & DevOps skills

Data Engineer, Data Extraction and ETL, Data Science

Summary

- Data Engineer with a Ph.D. degree in Measurement methods, Master of industrial automation
- 16+ years experience with data-driven projects
- Strong background in statistics, machine learning, AI, and predictive modeling of big data sets.
- AWS Certified Data Analytics. AWS Certified Cloud Practitioner. Microsoft Azure services.
- Experience in ETL operations and data curation
- PostgreSQL, SQL, Microsoft SQL, MySQL, Snowflake
- Big Data Fundamentals via PySpark, Google Cloud, AWS.
- Python, Scala, C#, C++
- Skills and knowledge to design and build analytics reports, from data preparation to visualization in BI systems.

WORK EXPERIENCE

Data Engineer

Apr-2011 To Till now

Project: AWS ELT data pipeline and AWS cloud deployment architecture 

(2022-06 – current)

Project Description: Creation of ELT pipelines deployed on AWS to collect data from e-commerce platforms

Responsibilities:

  • architecture design of ELT pipeline that gathers data from e-commerce clients into a single data warehouse.
  • using DBT for processing customer data and identifying similar attributes.
  • setting up Airbyte connections, developing custom Airbyte connectors, deploying AWS architecture, Terraform scripting
  • building custom data management tools, creating data flow security solutions

Tools & Technologies: Python, Airbyte, Kubernetes, AWS EC2, CI/CD, OpenVPN server, AWS Lambda, AWS SQS, Fargate, BigQuery, DBT, Airflow, AWS Cloudwatch, REST API, AWS ECR

Project: Batch and Streaming Data Ingest into DataLake

(2021-10 – 2022-05)

Responsibilities:

  • Design data processing pipelines for medical/ marketing/ e-commerce applications.
  • Data Modelling,
  • Database Design,
  • Database development,
  • using DBT for processing patient data,
  • Big data processing using Spark Scala,
  • Distributed platform development,
  • ETL Data Transformation
  • ETL Architecture and ETL Solutions Design

Tools & Technologies: Python, Scala, DB (SQL, PostgreSQL), DBT, Spark, Hadoop, Terraform, Kubernetes, Helm, GitLab CI/CD, AWS, Keycloak, Swagger, AirFlow.

Project: Audience Segmentation

(2018 - 2021)

Building a custom customer data platform for a marketing company. Build an ETL pipeline that allows retrieving the data from multiple sources and storing them in the private data warehouse in Hadoop. Create CloudFormation "infrastructure as a code" description of the pipeline and CI/CD to deploy it into the desired environment. Work with streaming data in Amazon Kinesis. Design sources for BI reports in AWS.

Responsibilities:

  • Design and implement batch and event-driven workflows for big data processing
  • Automated tests for distributed applications
  • Data analysis and visualization
  • Develop applications for data ingestion and selection
  • Develop a recommendation system
  • Built reporting dashboards in QuickSight from Athena sources.

Tools & Technologies: Python, Scala, SQL, Kubernetes, Spark, Hadoop framework, Docker, AWS (Storage, Database, DocumentDB, Athena, Lambda, Glue, API Gateway, Kinesis, QuickSight, CI/CD AWS CloudFormation and CodePipeline), Grafana, Git.

Data scientist and Data/software engineer

(Jan-2011  To 2018)

  • data analysis
  • applying machine learning algorithms
  • image analysis
  • image recognition
  • neural networks developing and tuning
  • Database development
  • ETL operations engineering
  • Development of backend services for data curation
  • Automated tests for CI/CD workflows

Tools & Technologies: C#, Python, Keras, TensorFlow, Theano, OpenCV, Pandas, Microsoft SQL Server, SQL, .NET Framework,

Associate Professor

(09/1999–Present)

Department of industrial automation

Taught courses:

  • Database development
  • Database management systems
  • Object-oriented programming
  • Parallel programming
  • System programming
  • Development .NET applications

EDUCATION AND TRAINING

  • Measurement methods and devices Ph.D. Degree, EQF level 8  
  • Master of industrial automation, EQF level 7 

COMMUNICATION SKILLS 

  • Communication skills both oral and written gained as a university professor and R&D projects participant
  • Presentation skills gained as a scientific conference speaker

COURSES & CERTIFICATES:

  • AWS Certified Data Analytics
  • HDP Overview: Apache Hadoop Essentials (SPLL)
  • Feature Engineering with PySpark
  • Big Data Fundamentals via PySpark
  • Deep Learning in Python
  • Intermediate Python for Data Science
  • Linear Classifiers in Python
  • Machine Learning with the Experts
  • Python Data Science Toolbox