Upstaff Sign up
Simon K.
🇩🇪Germany (UTC-05:00)
Created AtUpstaffer since July, 2023

Simon K. — Python Software Engineer with data engineering skills

Expertise in Data Analyst (DA), Data Engineer, AI and Machine Learning.

Last verified on April, 2024

Core Skills

Bio Summary

- 2+ years of experience with Python as a Data Engineer and Deep/Machine Learning Intern
- Experience with Data Vault modeling and AWS cloud services (S3, Lambda, and Batch)
- Cloud Services: Sagemaker, Google BigQuery, Google Data Studio, MS Azure Databricks, IBM Spectrum LSF, Slurm
- Data Science Frameworks: PyTorch, TensorFlow, PySpark, NumPy, SciPy, scikit-learn, Pandas, Matplotlib, NLTK, OpenCV
- Proficient in SQL, Python, Linux, Git, and Bash scripting.
- Had experience leading a BI development team and served as a Scrum Master.
- Native English
- Native German

Technical Skills

Programming LanguagesC++, Java, Python
Java FrameworksApache Spark
Scala FrameworksApache Spark
AI & Machine LearningAWS SageMaker (Amazon SageMaker), NumPy, OpenCV, PyTorch, Scikit-learn, TensorFlow
Python Libraries and ToolsMatplotlib, NLTK, NumPy, Pandas, PySpark, PyTorch, Scikit-learn, SciPy, TensorFlow
Data Analysis and Visualization TechnologiesApache Spark, Databricks, Jupyter Notebook, MapReduce, Pandas
Databases & Management Systems / ORMApache Hadoop, Apache Spark, Greenplum, MongoDB, MySQL, NoSQL, PostgreSQL, SQL
Cloud Platforms, Services & ComputingAWS, IBM Spectrum LSF, Slurm
Amazon Web ServicesAWS Batch, AWS Lambda, AWS S3, AWS SageMaker (Amazon SageMaker)
Azure Cloud ServicesDatabricks
Google Cloud PlatformGoogle BigQuery
Virtualization, Containers and OrchestrationDocker
Version ControlGit
Operating SystemsLinux
Third Party Tools / IDEs / SDK / ServicesPyCharm
Scripting and Command Line InterfacesShell Scripts
Other Technical SkillsMulti-threading, YAML

Experience

Deep Learning Intern – Multi-task learning, Bosch Center for Artificial Intelligence (BCAI)

03/2022 – 09/2022 (Renningen, Germany)

  • Developed novel loss weighting methods for Multi-task learning that outer-formed state-of-the-art methods
  • Compared loss balancing methods from the literature on tasks such as semantic segmentation, depth, and normal surface estimation on scene understanding datasets such as Cityscapes and NYUv2
  • Registered two novel loss weighting methods as patents
  • Documented the results within the master thesis

Technologies: Python, PyTorch, MTL, Git, IBM Spectrum LSF

 

Technical Solutions Specialist/Data Engineer, Scalefree International GmbH

10/2020 – 01/2022 (Hanover, Germany)

  • Led the internal BI development team as a Scrum Master
  • Established connection between different source systems and the enterprise data warehouse
  • Developed processes for loading the staging area and the raw data vault by employing AWS services such as S3, Lambda, and Batch
  • Created XML documents using T-SQL and XQuery for an external customer project
  • Containerized jobs using docker and yaml to load the enterprise data warehouse and deployed them using AWS batch.

Technologies: SQL, Python, Linux, Git, Bash Script, Data, Vault, AWS, YAML

 

Machine Learning Intern – Cloud ML Services, Novatec Consulting GmbH

11/2019 – 06/2020 (Hanover, Germany)

  • Developed a prototype application for churn prediction and evaluating different machine learning algorithms such as Random Forest, SVM, Gradient Boosted Decision Trees, and Logistic Regression using sci-kit-learn
  • Compared the Cloud Machine Learning Services MS Azure Databricks with PySpark, AWS Sagemaker, and Google Cloud BigQuery
  • Documented the results of the comparison within the bachelor thesis Python Scikit-learn

Technologies:PySpark, MS Azure, AWS GCP

Academic Projects

Student Research Project, University of Hildesheim

12/2020 – 03/2022 (Hildesheim, Germany)

  • Conducted image-to-image translation between the domains of regular images and artworks with Deep Generative Adversarial Networks using Tensor-Flow
  • Enhanced CycleGAN by introducing a two-objective discriminator as regularization, incorporating adversarial self-defense for better cycle consistency, and applying differentiable augmentation on the target domain with fewer data
  • Employed agile intercultural project management techniques to manage the project successfully

Technologies: Python, TensorFlow, GANs, Git, Slurm

Coursework

Machine Learning, University of Hildesheim

04/2020 – 09/2021 (Hildesheim, Germany)

  • Implemented various machine learning models such as ridge regression with SGD, LASSO with coordinate descent, least-angle regression, logistic regression with Newton method, gradient-boosted decision tree, and AdaBoost from scratch in Python and NumPy on real-world datasets like Rossmann sales and Wine quality data. Employed data preprocessing techniques such as one-hot encoding, stratified sampling, PCA, and KNN data imputation
  • Conducted performance comparison of the implemented models with a sci-kit-learn implementation
  • Performed exploratory data analysis on various real-world datasets using Pandas and Matplotlib
  • Developed a recommender system by applying matrix factorization with SGD on a movie lens 100k dataset

Technologies: Python, NumPy, Pandas, sci-kit-learn, Matplotlib

 

Deep Learning/Computer Vision, University of Hildesheim

04/2021 – 09/2021 (Hildesheim, Germany)

  • Trained a CNN end-to-end on a self-driving dataset (camera view from the car) using regularization techniques such as cutout and mixup and implemented a custom batch normalization layer and residual connections to predict the steering angle in PyTorch
  • Computed the saliency map for an input image using an ImageNet pre-trained model
  • Compared metric learning techniques such as learned embedding of a simple classification model, contrastive loss, and triplet loss with an embedding layer for MNIST data using TensorFlow
  • Implemented transfer learning for training a U-Net model on a real-world weed field image dataset with a custom categorical cross-entropy loss. Pretrained the first half of the model on the classification dataset DeepWeeds using TensorFlow, improving the test accuracy by 1.5% compared to a vanilla U-net model, and visualized the predicted segmentation map
  • Generated adversarial examples using the Carlini-Wagner attack against a CNN trained on MNIST data and created sparse perturbations with the Hoyer-Square regularizer using PyTorch

Technologies: Python, PyTorch, TensorFlow

 

Distributed Computing, University of Hildesheim

04/2020 – 03/2021 (Hildesheim, Germany)

  • Performed exploratory data analysis using PySpark on the movie lens 10m dataset and used the Hadoop MapReduce framework on BTS flight data
  • Conducted distributed K-means clustering and distributed linear regression using SGD on KDD Cup 1998 dataset and VirusShare executables with OpenMPI, including a performance analysis on the speed-up with different numbers of used cores
  • Implemented Naive Bayes and SVM classifiers from scratch to categorize news items on 20 newsgroups text datasets using preprocessing techniques such as bag-of-words and TF-IDF feature representation and the Hadoop MapReduce framework
  • Employed distributed matrix factorization using coordinate descent with the Hadoop MapReduce framework on the movie lens 10m dataset

Technologies: Python, Hadoop, MapReduce, PySpark, OpenMPI, mpi4py

 

Reinforcement Learning, University of Hildesheim

10/2022 – 03/2023 (Hildesheim, Germany)

  • Utilized PyTorch to develop both the Deep Q-Learning model and the REINFORCE algorithm with policy gradients from scratch to solve the Gym environment Mountain Car

Technologies: Python, PyTorch

Education

M.S. Data Analytics, University of Hildesheim

04/2020 – 01/2023 Hildesheim, Germany
GPA: 3.5/4.0

B.S. Business Information Systems, University of Applied Sciences and Arts Hanover

03/2016 – 06/2020 Hanover, Germany
GPA: 3.5/4.0

Certificates

  • Certified Data Vault 2.0 Practitioner (CDVP2)
  • Professional Scrum Master (PSM I)

How to hire with Upstaff

1

Talk to Our Talent Expert

Our journey starts with a 30-min discovery call to explore your project challenges, technical needs and team diversity.

2

Meet Carefully Matched Talents

Within 1-3 days, we’ll share profiles and connect you with the right talents for your project. Schedule a call to meet engineers in person.

3

Validate Your Choice

Bring new talent on board with a trial period to confirm you hire the right one. There are no termination fees or hidden costs.

Why Upstaff

Upstaff is a technology partner with expertise in AI, Web3, Software, and Data. We help businesses gain competitive edge by optimizing existing systems and utilizing modern technology to fuel business growth.

Real-time project team launch

<24h

Interview First Engineers

Upstaff's network enables clients to access specialists within hours & days, streamlining the hiring process to 24-48 hours, start ASAP.

x10

Faster Talent Acquisition

Upstaff's network & platform enables clients to scale up and down blazing fast. Every hire typically is 10x faster comparing to regular recruitement workflow.

Vetted and Trusted Engineers

100%

Security And Vetting-First

AI tools and expert human reviewers in the vetting process is combined with track record & historically collected feedbacks from clients and teammates.

~50h

Save Time For Deep Vetting

In average, we save over 50 hours of client team to interview candidates for each job position. We are fueled by a passion for tech expertise, drawn from our deep understanding of the industry.

Flexible Engagement Models

Arrow

Custom Engagement Models

Flexible staffing solutions, accommodating both short-term projects and longer-term engagements, full-time & part-time

Sharing

Unique Talent Ecosystem

Candidate Staffing Platform stores data about past and present candidates, enables fast work and scalability, providing clients with valuable insights into their talent pipeline.

Transparent

$0

No Hidden Costs

Price quoted is the total price to you. No hidden or unexpected cost for for candidate placement.

x1

One Consolidated Invoice

No matter how many engineers you employ, there is only one monthly consolidated invoice.

Ready to hire Simon K.
or someone with similar Skills?
Looking for Someone Else? Join Upstaff access to All profiles and Individual Match
Start Hiring