Oleh O AI/ML Engineer and Data Scientist
Summary
* Data Scientist with a Master’s Degree in Computer Science and extensive experience in machine learning, deep learning, and cloud services ( Azure,AWS and GCP).
* 7 years of experience, proficient in building ML pipelines and deploying scalable solutions.
* Developed and deployed a voice-to-voice pipeline for a call center VoIP system using Whisper, Llama, and MMS-TTS, including API integrations and Docker deployment.
* Implemented a Google Doc AI Wrapper, improving document recognition accuracy through advanced preprocessing techniques.
* Designed a deforestation detection system using clustering and Azure-based services to monitor forest areas.
Work Experience
Caller - VoIP server for call center
Implemented voice-to-voice pipeline in Hebrew for VoIP server for call center.
Responsibilities:
- Implemented voice-to-voice pipeline Whisper (STT) - Llama (text-only) - mms-tts (TTS) - OpenVoice (voice cloning),
- Implemented API to interact with models and to integrate them with the VoIP server,
- Prepared dataset for training text-to-speech and speech-to-text models,
- Trained whisper model,
- Deployed models using Docker, created Dockerfiles for models,
- Prompt engineered LLama,
- Configured Asterisk VoIP server,
Tools and technologies: Docker, AWS, ollama, vLLM, Llama, Whisper, MMS-TTS, OpenVoice, Prompt Engineering, Asterisk
Google Doc AI Wrapper
Implemented wrapper for Google Doc AI service in order to improve service accuracy in recognizing freeform documents
Responsibility:
- Used image preprocessing techniques to improve document recognition by Google service,
- Parsed recognized document data and searched for desired form fields, tables and other valuable data,
- Implemented processing and filtering recognized table data,
- Implemented optimization function that combine different image preprocessing methods in order to find combination that provide most recognized data that are searched for
Tools and technologies: Google document ai, OpenCV, pillow, Pandas
Assets tracking with camera POC
tracking asset’s location: real-time positioning, collecting information about cargo and handling equipment using machine vision
Responsibilities:
- Trained container segmentation model, added OCR,
- Created a pipeline for localizing an object on a map using a depth camera,
- Implemented optimization function that combine different image preprocessing methods.
Tools and technologies: Google Vision API, OAK-D (OpenCV Kit), Tensorflow, Keras, Unity3D (synthetic data generation, visualization)
LiveScetchScanner
Implemented camera image and mask processing module; implemented voice command interface
Responsibilities:
- Extraction image part based on mask from NN and Hough line detector,
- Image transformation, processing, filtering,
- implementation voice command interface using Amazon Lex bot API and rhino model
Tools and technologies: OpenCV, Pyaudio, Pvporcupine, rhino, Amazon Lex, boto3
Meowtalk
App to translate the cat’s meows
Responsibilities:
- Created Docker containers with code parts needed for model training,
- set up training pipeline for Meowtalk model using Kubeflow Pipeline in GCP,
- Modified data preprocessing to achieve better accuracy of the model.
Tools and technologies: Docker, Kubeflow pipeline, GCP, Keras
Creating the model to predict the playoff refsults of the World Cup
Responsibilities:
- Data analysis, preprocessing, data cleaning
- Search for additional data
- Feature development based on all available data
- Dataset formation for training models
- Selection of models (Linear models, Decision trees, MLP) for prediction and analysis of results
Tools and technologies used: scikit-learn, PCA, Linear models, Decision trees, MLP
Meeting summarizer
Implemented model for creation summaries of audio meetings based on article
Responsibility and achievements:
- Research of existing abstractive text summarization methods
- Implemented tree-based method
Tools and technologies used: Python, Azure, spacy, NLP-abstractive text summarization, MIP
Deforestation detection
Developing model for deforestation detection on Ukraine territory
Achievements:
- Collected dataset of map fragments: writing scripts to work with the API, searching for the necessary fragments by coordinates, date, filtering by cloudiness and types of fragments and their download
- Created a dataset for network training by clustering images (k-means, image clustering), searching for outliers / anomalies, and then selecting images from different clusters
- Implemented forest watch service based on Azure function
- Implemented a model for deforestation detection.
Tools and technologies used: Rasterio, MS Azure, image clustering
Education
- Master’s Degree in Computer Science, National University