Research Scientist, Math & Physics background

Data Engineer, AI and Machine Learning, Data Science

$ 6,000/month

B2 (Upper-Intermediate) English

Senior (5-10 years)

Are you a talented developer looking for a remote job that lets you show your skills and get decent compensation? Join Upstaff.com, a platform that connects you with hand-picked startups and scale-ups in the US and Europe.

Summary

Looking for a Science / ML Engineer with a strong foundation in mathematics and physics-inspired modeling, and graph-based methods.
* The main goal is to build and train ML models from scratch for a metadata platform with federated learning features and privacy-preserving techniques.
In this role, you will:
* Design and train models that integrate semantic structures, ontologies, and knowledge graphs with advanced ML techniques (graph neural networks, embeddings, transformers).
* Build federated, privacy-preserving AI pipelines for industrial ecosystems, enabling compliance, digital product passports, and large-scale supply chain traceability.
* This is a unique opportunity to combine ML engineering, formal semantics, and computational science to create scalable AI solutions that go beyond traditional data science.
* Full-time, remote, long-term.
* European timezone overlap 100%, B2 or C1 English (periodical optional business trips)

Required Skills

LLM

NLP

Python

Machine Learning 3.0 yr.

Nice to Have

AWS Vector DB Vector RAG Text NLP models RAG BERT Apache Spark GNN PyTorch Neo4j GraphDB TensorFlow Flower Go AWS SageMaker (Amazon SageMaker)

ID: 100-133-731

Last Updated: 2025-09-30

Project Description

We are looking for an ML (DS / Semantic /AI Data Engineer) who will work on developing and fine-tuning models for a new data platform project. What you’ll bring:

Strong Math & Physics background
Expertise in developing ML models, training, and tuning (supervised, unsupervised, semi-supervised learning, model optimization, and evaluation).
Long-term, full-time role with big influence, ownership, and room to grow
Join our team of top engineers and researchers building a next-generation AI platform for cross-industry metadata knowledge exchange and collaboration.

Must-Have Skills

Machine Learning Fundamentals:

Understanding of supervised, unsupervised, and semi-supervised learning techniques.
Knowledge of algorithms relevant to semantic data, such as graph neural networks (GNNs), embeddings (e.g., Word2Vec, BERT), or clustering for entity resolution.

Training AI/ML models (not only using APIs), including exposure to LLMs or custom models:

Knowledge of training machine learning models, including hyperparameter tuning, cross-validation, and optimization.
Ability to evaluate models using metrics like precision, recall, F1-score, or Mean Reciprocal Rank (MRR) for knowledge graph tasks.

Mathematics & Physics-inspired modeling

graph theory, formal models
differential equations, quantum equations

Natural Language Processing (NLP):

Skills in NLP techniques for semantic tasks like named entity recognition (NER), entity linking, or text-to-triple extraction.
Familiarity with transformer models (e.g., BERT, RoBERTa) for semantic understanding, entity recognition, entity linking, text-to-triple extraction.

Semantic Querying and Reasoning:

Proficiency in SPARQL for querying RDF datasets to prepare training data.
Understanding of reasoning techniques to augment training datasets with inferred knowledge.
Experience with semantic embeddings (TransE, DistMult, ComplEx, GraphSAGE).

Graph Machine Learning (GNNs) with PyTorch Geometric or DGL; experience in link prediction, node classification, or graph completion.

Proficiency in working with graph-based models for tasks like link prediction, node classification, or knowledge graph completion.
Understanding of graph algorithms and embeddings for semantic reasoning.
Knowledge Graphs & Ontologies: RDF, OWL, SPARQL, Protégé, Neo4j/GraphDB.

Programming and Data Manipulation:

Strong programming skills in Python (or similar languages like R or Java) for model development and data preprocessing.
Experience with libraries for data manipulation (e.g., Pandas, NumPy) and semantic data handling (e.g., RDFLib, OWLReady2).

Federated Learning & Privacy-Preserving AI (TensorFlow Federated, Flower, differential privacy).

Nice-to-Have Skills

Data pipelines (Apache Spark, Kafka, Airflow).
Cloud experience: AWS (SageMaker, Neptune, S3), Azure, or GCP.
Backend programming experience in Go, Scala, or Java.
Awareness of Industrial Data Standards: RAMI 4.0, Catena-X, Digital Product Passports, IT/OT integration.
Knowledge of MLOps tools: MLflow, Kubeflow, containerization (Docker, Kubernetes), CI/CD.
Strong collaboration skills, ability to work with cross-disciplinary R&D teams and domain experts.
English proficiency B2+/C1, with clear technical communication skills.

Not your tech stack?

Join the Upstaff community and we are looking for the best project for you. Be ready for the next steps:

Create your profile on our website (import from LinkedIn)
20-30-minute screening call
Technical interview
Feedback
Project Selection (we are looking for the best project for you).

We work with developers from 50+ countries in different regions: Europe, LATAM, the U.S. (W-9 form owners), Canada, Asia (Philippines, Indonesia), Oceania (Australia, New Zealand, Papua New Guinea), and the the UK.

We don’t have a legal and ethical basis to accept applicants from the following countries: Russia, Belarus, Iran, North Korea
We do not provide visa assistance, and our cooperation model does not include the benefits typically offered with direct hire.