Oleh O AI/ML Engineer and Data Scientist

Data Science (7.0 yr.), AI and Machine Learning (4.0 yr.)

Summary

* Data Scientist with a Master’s Degree in Computer Science and extensive experience in machine learning, deep learning, and cloud services ( Azure,AWS and GCP).
* 7 years of experience, proficient in building ML pipelines and deploying scalable solutions.
* Developed and deployed a voice-to-voice pipeline for a call center VoIP system using Whisper, Llama, and MMS-TTS, including API integrations and Docker deployment.
* Implemented a Google Doc AI Wrapper, improving document recognition accuracy through advanced preprocessing techniques.
* Designed a deforestation detection system using clustering and Azure-based services to monitor forest areas.

Work Experience

Caller - VoIP server for call center

Implemented voice-to-voice pipeline in Hebrew for VoIP server for call center.

Responsibilities:

  • Implemented voice-to-voice pipeline Whisper (STT) - Llama (text-only) - mms-tts (TTS) - OpenVoice (voice cloning),
  • Implemented API to interact with models and to integrate them with the VoIP server,
  • Prepared dataset for training text-to-speech and speech-to-text models,
  • Trained whisper model,
  • Deployed models using Docker, created Dockerfiles for models,
  • Prompt engineered LLama,
  • Configured Asterisk VoIP server,

Tools and technologies: Docker, AWS, ollama, vLLM, Llama, Whisper, MMS-TTS, OpenVoice, Prompt Engineering, Asterisk

Google Doc AI Wrapper

Implemented wrapper for Google Doc AI service in order to improve service accuracy in recognizing freeform documents

Responsibility:

  • Used image preprocessing techniques to improve document recognition by Google service,
  • Parsed recognized document data and searched for desired form fields, tables and other valuable data,
  • Implemented processing and filtering recognized table data,
  • Implemented optimization function that combine different image preprocessing methods in order to find combination that provide most recognized data that are searched for

Tools and technologies: Google document ai, OpenCV, pillow, Pandas

Assets tracking with camera POC

tracking asset’s location: real-time positioning, collecting information about cargo and handling equipment using machine vision

Responsibilities:

  • Trained container segmentation model, added OCR,
  • Created a pipeline for localizing an object on a map using a depth camera,
  • Implemented optimization function that combine different image preprocessing methods.

Tools and technologies: Google Vision API, OAK-D (OpenCV Kit), Tensorflow, Keras, Unity3D (synthetic data generation, visualization)

LiveScetchScanner

Implemented camera image and mask processing module; implemented voice command interface

Responsibilities:

  • Extraction image part based on mask from NN and Hough line detector,
  • Image transformation, processing, filtering,
  • implementation voice command interface using Amazon Lex bot API and rhino model

Tools and technologies: OpenCV, Pyaudio, Pvporcupine, rhino, Amazon Lex, boto3

Meowtalk

App to translate the cat’s meows

Responsibilities:

  • Created Docker containers with code parts needed for model training,
  • set up training pipeline for Meowtalk model using Kubeflow Pipeline in GCP,
  • Modified data preprocessing to achieve better accuracy of the model.

Tools and technologies: Docker, Kubeflow pipeline, GCP, Keras

Creating the model to predict the playoff refsults of the World Cup

Responsibilities:

  • Data analysis, preprocessing, data cleaning
  • Search for additional data
  • Feature development based on all available data
  • Dataset formation for training models
  • Selection of models (Linear models, Decision trees, MLP) for prediction and analysis of results

Tools and technologies used: scikit-learn, PCA, Linear models, Decision trees, MLP

Meeting summarizer

Implemented model for creation summaries of audio meetings based on article

Responsibility and achievements:

  • Research of existing abstractive text summarization methods
  • Implemented tree-based method

Tools and technologies used: Python, Azure, spacy, NLP-abstractive text summarization, MIP

Deforestation detection

Developing model for deforestation detection on Ukraine territory

Achievements:

  • Collected dataset of map fragments: writing scripts to work with the API, searching for the necessary fragments by coordinates, date, filtering by cloudiness and types of fragments and their download
  • Created a dataset for network training by clustering images (k-means, image clustering), searching for outliers / anomalies, and then selecting images from different clusters
  • Implemented forest watch service based on Azure function
  • Implemented a model for deforestation detection.

Tools and technologies used: Rasterio, MS Azure, image clustering

Education

  • Master’s Degree in Computer Science, National University