Henry A., Python engineer with automation, data quality and scientist skills

Data Mining and Management (5.0 yr.), Data Engineer, Data Extraction and ETL, Data Science, Data Scraping

Summary

- 8 years experience with various data disciplines: Data Engineer, Data Quality Engineer, Data Analyst, Data Management, ETL Engineer
- Automated Web scraping (Beautiful Soup and Scrapy, CAPTCHAs and User agent management)
- Data QA, SQL, Pipelines, ETL,
- Data Analytics/Engineering with Cloud Service Providers (AWS, GCP)
- Extensive experience with Spark and Hadoop, Databricks
- 6 years of experience working with MySQL, SQL, and PostgreSQL;
- 5 years of experience with Amazon Web Services (AWS), Google Cloud Platform (GCP) including Data Analytics/Engineering services, Kubernetes (K8s)
- 5 years of experience with PowerBI
- 4 years of experience with Tableau and other visualization tools like Spotfire and Sisense;
- 3+ years of experience with AI/ML projects, background with TensorFlow, Scikit-learn and PyTorch;
- Extensive hands-on expertise with Reltio MDM, including configuration, workflows, match rules, survivorship rules, troubleshooting, and integration using APIs and connectors (Databricks, Reltio Integration Hub), Data Modeling, Data Integration, Data Analyses, Data Validation, and Data Cleansing)
- Upper-intermediate to advanced English,
- Henry is comfortable and has proven track record working with North American timezones (4hour+ overlap)

Work Experience

Senior QA Automation Engineer, Streaming Media

(October 2024 – Present)

Overview: Responsible for end-to-end QA automation and data integrity validation across multiple services, while building intelligent web scraping solutions for test data generation, API monitoring, and content verification.

Key Responsibilities:

Designed and implemented automated test frameworks for API and UI validation using Python, Pytest, and Selenium.
Developed robust web scraping tools with BeautifulSoup, Scrapy, and Selenium to automate extraction of structured and unstructured data for QA validation and regression testing.
Integrated anti-scraping evasion techniques, including user-agent rotation, and CAPTCHA handling (using 2Captcha and headless browsers).
Created reusable scraping modules to dynamically pull and validate data from third-party platforms, supporting system integration testing.
Built scraping-based test suites for monitoring content drift, checking for broken data links, and verifying real-time pricing or listing updates.
Employed CI/CD pipelines with GitLab and Docker to schedule and run automated scraping & QA jobs.
Collaborated closely with developers and product owners in an agile team to ensure test coverage, scalability, and data reliability across sprints.
Maintained QA dashboards using tools like Allure, and contributed to performance benchmarking using scraped metrics.

Tech Stack: Python, Pytest, Selenium, Scrapy, BeautifulSoup, 2Captcha API, Docker, GitLab CI/CD, AWS (EC2/S3), REST APIs, PostgreSQL, Allure

Data Engineer, MDM

May 2023 - October 2024 (Industry: IT Services, Digital Marketing)

Overview: AI-driven digital marketing that assists companies with creating high customer engagement by providing marketers with the ability to offer real-time, targeted, personalized customer experience in the context of the moment.

Reltio MDM Expertise:

Configured and optimized key Reltio MDM features, including Data Loader, workflows, Integration Hub (RIH), APIs, match rules, and survivorship rules.
Designed and managed data structures and workflows within Reltio MDM, ensuring high data integrity and performance.
Developed and implemented connectors, such as the Databricks connector, to facilitate seamless data integration across platforms.
Troubleshot and resolved complex MDM issues, delivering reliable and scalable solutions tailored to business needs.
Data: Modeling, Integration, Analyses, Validation, Transcoding, Cleansing, Unification, Workflows

BI Tableau Experience Highlights for Real Estate Project

Successfully completed a full end-to-end Reltio MDM implementation to ensure consistent and accurate master data for property datasets, significantly improving the quality of predictive analytics.
Developed and maintained predictive algorithms for US house prices using machine learning techniques, enriched with data governed by Reltio MDM.
Created interactive data visualizations for real estate agents and investors using Tableau, leveraging clean and reliable master data.
Analyzed various data points on comparables for single-family homes and condos, including location, property age, and amenities, using MDM-optimized workflows.

Market Research Project

Led another full Reltio MDM implementation to integrate and harmonize customer demographics and sales data across multiple sources.
Utilized Tableau for comprehensive daily and historical data reporting and visualization, underpinned by MDM-enriched datasets.
Configured match rules and survivorship rules in Reltio to ensure the integrity and accuracy of customer and product data used for analytics.
Configured and optimized key Reltio MDM features, including Data Loader, workflows, Integration Hub (RIH), APIs, match rules, and survivorship rules.
Designed and managed data structures and workflows within Reltio MDM, ensuring high data integrity and performance.
Developed and implemented connectors, such as the Databricks connector, to facilitate seamless data integration across platforms.
Implement AI image and text generation features from open-source and 3rd party APIs;
Own and manage data generated and consumed on the front end with Database management best practices in Postgres and Databricks.
Implement security features using, JWT, OAuth2 Authentication, SSO (Okta provider), Roles, and Permissions (RBAC)
Perform code reviews for compliance with the best engineering practices, coding standards, and quality criteria set for the projects;
Provide suggestions to improve the architecture, and coding practices, build/verification toolset, and solve other technical challenges.

Technologies:

Reltio MDM, ElasticSearch AI, LangChain, LLM, ChatGPT, Dall-E 3, HuggingFace, Stable Diffusion,
Python, FastAPI, Asyncio, PostgreSQL, Databricks
AWS, Lamda, GCP, Pandas, PySpark
Postman, Docker, Git, Load Test, Locust

AI/ML, Scraping Engineer, Talents HR Platform

February 2023 - May 2023

Overview:

Technologies: AWS, Restful API, Python, Pytest, Allure, R, Shiny, JavaScript, Docker, Kubernetes, ChatGPT, Dall-E 3, Stable Diffusion, AWS Bedrock, Face Recognition Dlib, KNN (K-Nearest Neighbors), SVM (Support Vector Machine), Linear Regression, G-Boost (Gradient Boosting), JWT (JSON Web Token), OAuth2 Authentication, PyTorch, RBAC (Role-Based Access Control), Scikit-learn, FastAPI, Asyncio, PostgreSQL, Databricks, LinkedIn API, Glassdoor-like platform integration, Google Docs API, PDF parser.

Data Engineer and Data Analyst, Power BI - home equity investments

January 2023 - February 2023

Overview: Startup that revolutionizes the home equity market in the US. Our team is working on providing outstanding BI services with accessible data to decision-makers as well as streamlining the current services and their effectiveness.

Responsibilities

Design and develop Tableau dashboards;
Utilized Spotfire to design and implement interactive dashboards that provided real-time insights into key business metrics;
Produce well-designed, efficient code by using the best software development practices;
Perform code reviews for compliance with the best engineering practices, coding standards, and quality criteria set for the projects;
Use TensorFlow to automate and optimize code reviews, ensuring compliance with best engineering practices through AI-driven code quality assessments;
Provide suggestions to improve the architecture, and coding practices, build/verification toolset, and solve customer problems.

Technologies: Tableau, SQL, Snowflake, TensorFlow.

Data Engineer, Data Quality in Data management platform / Amazon E-Commerce Aggregator

2020-August 2022

Overview: Next-generation consumer goods company reimagining how the world’s most-loved products become accessible to everyone. We use a deep understanding of rankings, ratings, and reviews to identify and acquire quality brands and use world-class expertise and data science to make their products better or create new ones to meet changing customer demand.

Responsibilities:

Use Sisense to build dashboards for tracking updates to selected Amazon store brands for determined time periods. I used the interactive SQL palette for querying the tables to filter the needed information (columns) to be displayed in the dashboard. This dashboard provides the data engineering manager with the necessary information to make decisions on store brands.
Create and support ELT data pipelines built on Snowflake and DBT while ensuring high-quality data
Develop and deploy data warehousing models, and support existing processes/ETLs (extract/transform/load), and functions (in Python/SQL/DBT) in a cloud data warehouse environment using Snowflake, AWS services
SQL statements and developing in Python
Design and develop data pipelines (DAGs). Automation tests.

Technologies: Sisense BI, AirFlow, ETL, ElasticSearch, Snowflake, Python, SQL, DBT, Pandas, AWS S3, Medallion Architecture, MySQL, Hadoop, Spark, GitLab CI/CD, Kubernetes, LDAP, Automation Test, Pytest, Snowflake Schema, Dimensional Modeling, ER Diagrams.

AI Engineer, AI Project

September 2020 – July 2021

Responsibilities:

Develop automation workflows with RPA (UiPath).
Set up and manage web-based cloud services on AWS EC2.
Utilize TensorFlow to build and deploy models that predict key business metric

Python Developer, IoT-leveraged agricultural tech company

A project on monitoring and reporting sample data from agricultural plants on a field of land.

May 2020 – August 2020

Responsibilities:

Hands-on setting up, maintaining, and deploying services to AWS EC2.
Automated web scraping of data from webpages (Selenium) to scrape data from sensor-related data sources, public weather sites, and agricultural platform database.
Carried out multi-processing and parallelizing of code with PySpark.
Used Spark for 2 cases of data processing in an ELT phase:

1. Data was collected from drones and other specialized bots were used to physically survey the land area and take samples from the soil and air for properties such as soil pH, moisture content, specific gravity, etc for different types of crops planted on the field. This data was received gotten in real-time, and placed on a queue to be loaded into AWS DynamoDB. The transformation involves converting some data properties from the queue such as temperature from degree celsius to the kelvin scale, moisture content from cubic centimeters to cubic meters, etc. The transformed data is then loaded into AWS s3.

2. Process large batch data averaging 10 million rows with spark: There were cases where I had to transform data on a different database containing historical data to consolidate the currently maintained tables in another database. The historical data contains millions of rows of IoT-generated values. To optimize speed and memory usage for transformation, I used python’s implementation of Spark (Pyspark) to carry out the same transformation technique on the batch data to backfill the current table in the database.

IT Analyst, FieldworkAfrica

July 2016 – 2019

Responsibilities:

Developed data visualizations on PowerBI and Tableau to track areas of high and low drink consumption to establish which areas are potentially viable to push a new drink to.
Provided daily and historical data reports and visualizations to the technical director. Daily and historical reports included tracking the coverage of data collection in geographical areas, and providing updates on data quality checks and target data samples.
Developed and maintained cloud services on the Google cloud platform.
Developed questionnaire scripts on ODK for market research.
Led a data collection team of 10 people.
Performed data analysis using data tools, visualizations, and dashboards.
Used PowerBI and Tableau to provide daily and historical data reports and
visualizations to the technical director. Daily and historical reports included tracking the
coverage of data collection in geographical areas, providing updates on data quality checks and
target data samples.

Python Developer, NDA

Jan 2019 – April 2019

Responsibilities:

Working on websites back-end with flask and Django.
Maintaining SQL databases for proper scaling.
Ensuring proper test units are integrated to promote clean codes.

Data Science Trainee, DATA SCIENCE

2017

Responsibilities:

Implemented optimization algorithms.
Carried out analytics with Microsoft Azure for prediction models.
Generated various visualization models for data analytics with Power Bi and Seaborn.

Campus Ambassador, NDA

July 2016 - December 2016

Responsibilities:

Promoted the ScholarX mobile App on designated campuses and social platforms for the company achieving 1000 downloads on the Google Play Store.

Engineering Intern, NDA

April 2015 - July 2015

Responsibilities:

Assisted in a supervisory management role and design engineering in various structural steel processes.

BI Tableau Experience Highlights:

1. Real Estate Project

Developed and maintained predictive algorithms for US house prices using machine learning techniques such as regression and classification
Created interactive data visualizations for real estate agents and investors using Tableau
Analyzed a variety of data points on comparables for single-family homes and condos, including location, property age, and amenities
Assessed factors like ARV (After Repair Value), square footage, year built, number of beds and baths, garages, and local market conditions
Developed user-friendly dashboards to display real-time market trends and property values, enabling investors to make informed decisions quickly
Collaborated with a team of data scientists and engineers to continuously improve algorithms and visualizations

2. Tableau Specialist in Market Research Project:

Utilized Tableau for comprehensive daily and historical data reporting and visualization to support decision-making processes
Provided data insights and visualizations to the technical director, enabling a better understanding of market dynamics and trends
Created a range of custom dashboards for daily and historical reports that covered various aspects such as sales, customer demographics, and product performance
Monitored and analyzed data collection coverage in target geographical areas to ensure accurate representation of the market
Conducted regular data quality checks, including data validation and cleaning, to maintain high data accuracy and reliability
Collaborated with data engineers and analysts to optimize data collection methods and improve overall data quality

Education

College of Technology, 2018 - 2019, Higher National Diploma (HND), Mechanical Engineering
College of Technology, 2013 - 2016, National Diploma (ND), Mechanical Engineering

Certifications

Python Developer Certificate (Sensegrass), 2020
Google Scholarship Android, Basics - 2018
Certificate of Completion (DSN 2nd Data Science Boot Camp), 2017
Certificate of Proficiency in Human Resources and Skill Acquisition, 2014
Certificate of Participation ACM (Association for Computing Machinery), 2017
Big Data Foundations (Level 1), 2017
Data Science Foundations (Level 1), 2016

Not your tech stack?

Join the Upstaff community and we are looking for the best project for you. Be ready for the next steps: Create your profile on our website (import from LinkedIn)

20-30-minute screening call
Technical interview
Feedback
Project Selection (we are looking for the best project for you).

We work with developers from 50+ countries in different regions: Europe, LATAM, the U.S. (W-9 form owners), Canada, Asia (Philippines, Indonesia), Oceania (Australia, New Zealand, Papua New Guinea), and the the UK.

We don’t have a legal and ethical basis to accept applicants from the following countries: Russia, Belarus, Iran, North Korea
We do not provide visa assistance, and our cooperation model does not include the benefits typically offered with direct hire.