Position Overview We are seeking an experienced AI Data Engineer to lead the development and scaling of synthetic data generation systems that support multiple enterprise platforms. This role combines deep technical expertise in AI/ML data pipelines with practical experience in system integration and orchestration frameworks. Position Overview We are seeking an experienced Data Engineer to lead the development and scaling of synthetic data generation systems that support multiple enterprise platforms. This role combines deep technical expertise in data pipelines with practical experience in system integration, orchestration frameworks, and Kubernetes infrastructure management. Key ResponsibilitiesSynthetic Data Generation & Quality Assurance Design and implement scalable synthetic data generation systems to support model training Develop and maintain data quality validation pipelines ensuring synthetic data meets training requirements Build automated testing frameworks for synthetic data generation workflows Collaborate with ML teams to optimize synthetic data for model performance APIs & Integration Develop and maintain REST API integrations across multiple enterprise platforms Implement robust data exchange, transformation, and synchronisation logic between systems Ensure error handling, retries, and monitoring for all integration workflows Data Quality & Testing Implement automated data validation and testing frameworks for ETL and synthetic data workflows Translate data quality feedback from stakeholders into pipeline or generation process improvements Proactively monitor and maintain data consistency across systems Multi-System Integration & MCP Development Build and maintain tool registries for Model Control Protocol (MCP) integration across multiple enterprise systems Develop robust APIs for multi-system communication through MCP frameworks Design and implement workflows that coordinate multi-system interactions Ensure reliable data flow and error handling across distributed system architectures Cross-Functional Collaboration & Production Integration Partner with domain specialists to translate plan execution feedback into actionable insights Work closely with Product Managers to align synthetic data generation with business requirements Collaborate with Core Engineering teams to ensure seamless production deployment Establish feedback mechanisms between synthetic data systems and production environments Required Qualifications Technical Skills Programming: Proficiency in Python, Typescript (optional) Data Engineering: Experience in data engineering frameworks and libraries (Pandas, Apache Airflow, Prefect) APIs & Integration: Strong background in REST APIs and system integration Databases: Experience with relational and NoSQL databases (PostgreSQL, MongoDB) Cloud Platforms: Hands on experience with AWS/GCP/Azure Experience Requirements 2+ years experience in building production-scale data pipelines and orchestration systems Demonstrated success in cross-functional collaboration in technical environments Preferred Qualifications Familiarity with managing Kubernetes-based production workloads and workflow orchestration (Argo) Familiarity with containerisation and orchestration with tools like Docker, Kubernetes etc. Familiarity with synthetic or large-scale data generation Background in enterprise software integration Experience with Model Control Protocol (MCP) or similar orchestration frameworks Knowledge of automated testing frameworks for data pipelines What We Offer Lots of learning — many systems are being built from the ground up, with no existing references or open-source projects to rely on. This will be the first time not just for you, but for the industry as well. Opportunity to work at the forefront of enterprise-scale synthetic data generation Collaborative environment with product teams, engineering, and domain specialists Competitive compensation and comprehensive benefits Professional development opportunities in cutting-edge data engineering and Kubernetes orchestration Team Structure You'll report to the AI Engineering Lead and work closely with: ML Engineers developing foundation models Product Managers defining business requirements Product Specialists providing domain expertise Backend Engineers handling production infrastructure This role offers significant impact on our data capabilities and the opportunity to shape how we generate and utilize synthetic data for training enterprise systems.
Job Title
AI Engineer - Synthetic Data Generation