Database/BigData Engineer (Postgres, MySQL, Mongo) on Azure
Summary
The team's goal is to build a full-cycle data management platform, which will include data ingestion, ETL, data quality, data enrichment, data processing pipelines orchestrated into an "elastic data fabric" and most importantly, utilizing federated learning. (ML on multiple data sources with managed models and data interchange)
The client is a new venture, established by a well-established international business with expertise in the industrial manufacturing domain. The project’s ambition is to build an innovative data space platform for the industry, which will take the leading position because of its usefulnчess and performance.
Our Ideal candidate required skills are: experience with Mongo / Cosmos DB,
Strong knowledge of SQL databases (Ideally Azure SQL Database or similar - Postgres, MySQL), PostgreSQL, MySQL and derivative managed and unmanaged services.
Nice to have:
- Federated data, Knowledge of metadata management, data lineage and governance tools (e.g., Microsoft Purview or simialr tools)
- familiarity with ML, Data pipelines
- familiarity with Azure ecosystem ADLS Gen2 or equivalent for data lake, Databricks - would be a plus!
Project and Client:
The project is about building a data spaces platform for industrial manufacturing.
Data spaces is a shared distributed data management system combining multiple data sources applying ML models and managed data exchange.
The platform we're building aims to automate data ingestion, processing, and sharing with user-friendly, privacy-preserving, and scalable solutions for industrial manufacturing.
The platform will incorporate scalable and dynamic tools for creating and managing data spaces, handling complex data workflows, ensuring modularity and privacy compliance.
Project’s Technical Stack:
- Backend: Python, Flask/FastAPI, Go
- Frontend: ReactJS, Angular.
- AI/ML: Azure Machine Learning, Azure Databricks, TensorFlow Federated, PyTorch, and privacy-enhancing techniques.
- Cloud and DevOps: Kubernetes, Docker, Azure DevOps, CI/CD Data pipelines on Azure
- Data Engineering: Apache NiFi. Kafka Connect, Databricks - on Azure.
- Database: Cosmos DB, Postgres/Hyperscale or MySQL/Healwave
*the stack may change during the hiring process of qualified specialists in their areas