Research Scientist, Math & Physics background
Are you a talented developer looking for a remote job that lets you show your skills and get decent compensation? Join Upstaff.com, a platform that connects you with hand-picked startups and scale-ups in the US and Europe.
Summary
Looking for a Science / ML Engineer with a strong foundation in mathematics and physics-inspired modeling, and graph-based methods.
* The main goal is to build and train ML models from scratch for a metadata platform with federated learning features and privacy-preserving techniques.
In this role, you will:
* Design and train models that integrate semantic structures, ontologies, and knowledge graphs with advanced ML techniques (graph neural networks, embeddings, transformers).
* Build federated, privacy-preserving AI pipelines for industrial ecosystems, enabling compliance, digital product passports, and large-scale supply chain traceability.
* This is a unique opportunity to combine ML engineering, formal semantics, and computational science to create scalable AI solutions that go beyond traditional data science.
* Full-time, remote, long-term.
* European timezone overlap 100%, B2 or C1 English (periodical optional business trips)
Are you a talented developer looking for a remote job that lets you show your skills and get decent compensation? Join Upstaff.com, a platform that connects you with hand-picked startups and scale-ups in the US and Europe.
Required Skills
LLM
NLP
Python
Machine Learning 3.0 yr.
Nice to Have
Project Description
We are looking for an ML (DS / Semantic /AI Data Engineer) who will work on developing and fine-tuning models for a new data platform project. What you’ll bring:
- Strong Math & Physics background
- Expertise in developing ML models, training, and tuning (supervised, unsupervised, semi-supervised learning, model optimization, and evaluation).
- Long-term, full-time role with big influence, ownership, and room to grow
- Join our team of top engineers and researchers building a next-generation AI platform for cross-industry metadata knowledge exchange and collaboration.
Must-Have Skills
Machine Learning Fundamentals:
- Understanding of supervised, unsupervised, and semi-supervised learning techniques.
- Knowledge of algorithms relevant to semantic data, such as graph neural networks (GNNs), embeddings (e.g., Word2Vec, BERT), or clustering for entity resolution.
Training AI/ML models (not only using APIs), including exposure to LLMs or custom models:
- Knowledge of training machine learning models, including hyperparameter tuning, cross-validation, and optimization.
- Ability to evaluate models using metrics like precision, recall, F1-score, or Mean Reciprocal Rank (MRR) for knowledge graph tasks.
Mathematics & Physics-inspired modeling
- graph theory, formal models
- differential equations, quantum equations
Natural Language Processing (NLP):
- Skills in NLP techniques for semantic tasks like named entity recognition (NER), entity linking, or text-to-triple extraction.
- Familiarity with transformer models (e.g., BERT, RoBERTa) for semantic understanding, entity recognition, entity linking, text-to-triple extraction.
Semantic Querying and Reasoning:
- Proficiency in SPARQL for querying RDF datasets to prepare training data.
- Understanding of reasoning techniques to augment training datasets with inferred knowledge.
- Experience with semantic embeddings (TransE, DistMult, ComplEx, GraphSAGE).
Graph Machine Learning (GNNs) with PyTorch Geometric or DGL; experience in link prediction, node classification, or graph completion.
- Proficiency in working with graph-based models for tasks like link prediction, node classification, or knowledge graph completion.
- Understanding of graph algorithms and embeddings for semantic reasoning.
- Knowledge Graphs & Ontologies: RDF, OWL, SPARQL, Protégé, Neo4j/GraphDB.
Programming and Data Manipulation:
- Strong programming skills in Python (or similar languages like R or Java) for model development and data preprocessing.
- Experience with libraries for data manipulation (e.g., Pandas, NumPy) and semantic data handling (e.g., RDFLib, OWLReady2).
Federated Learning & Privacy-Preserving AI (TensorFlow Federated, Flower, differential privacy).
Nice-to-Have Skills
- Data pipelines (Apache Spark, Kafka, Airflow).
- Cloud experience: AWS (SageMaker, Neptune, S3), Azure, or GCP.
- Backend programming experience in Go, Scala, or Java.
- Awareness of Industrial Data Standards: RAMI 4.0, Catena-X, Digital Product Passports, IT/OT integration.
- Knowledge of MLOps tools: MLflow, Kubeflow, containerization (Docker, Kubernetes), CI/CD.
- Strong collaboration skills, ability to work with cross-disciplinary R&D teams and domain experts.
- English proficiency B2+/C1, with clear technical communication skills.