What we do:GMG is a global well-being company retailing, distributing and manufacturing a portfolio of leading international and home-grown brands across sport, everyday goods, health and beauty, properties and logistics sectors. Under the ownership and management of the Baker family for over 45 years, GMG is a valued partner of choice for the world's most successful and respected brands in the well-being sector. Working across the Middle East, North Africa, and Asia, GMG has introduced more than 120 brands across 12 countries. These include notable home-grown brands such as Sun & Sand Sports, Dropkick, Supercare Pharmacy, Farm Fresh, Klassic, and international brands like Nike, Columbia, Converse, Timberland, Vans, Mama Sita's, and McCain.What will you do:We are hiring a Data Architect to own the end-to-end architecture and engineering standards of our data and AI platform. This is a hands-on individual contributor role with leadership responsibility for 2 engineers. You will design, implement, and operate scalable, secure, and cost-effective data infrastructure across Databricks on AWS, enabling analytics/BI, classical ML, and GenAI/Agentic AI workloads Role Summary: - Own the data platform architecture (ingestion → lake/warehouse → serving) and its operating model. - Lead implementation of infrastructure, orchestration, CI/CD, observability, quality, lineage, and governance. - Architect and enable BI, MLOps, and Agentic AI platform capabilities. - Evaluate and introduce fit-for-purpose tools (open-source preferred) to solve team challenges. - Set engineering best practices and manage delivery through a small team. Responsibilities:Data platform & infrastructure ownership: - Own platform architecture on AWS + Databricks, ensuring scalability, security, reliability, and cost efficiency. - Define the target architecture across batch pipelines, streaming patterns, storage formats, and compute policies. - Implement infrastructure-as-code using Terraform, including environments, networking dependencies (as needed), and platform configuration. Architecture for BI, ML, and Agentic AI: - Design architecture patterns for: - BI data serving and exports to downstream BI stacks (e.g., Fabric) through governed, performant datasets. - MLOps foundations: training/inference patterns (batch-first), model registry/versioning approach, monitoring integration. - Agentic AI infrastructure: secure retrieval patterns, tool access boundaries, prompt/tool governance, and audit logs (platform-level enablers, not use-case specifics). - Ensure architectural decisions support both experimentation and production-grade operation. Data engineering best practices & SDLC: - Establish engineering standards: branching strategy, PR reviews, release/versioning, code quality gates, and documentation. - Implement CI/CD for data pipelines and infrastructure; enforce Git-based workflows and environment promotion. - Promote modular, reusable pipeline patterns and templates for the team. Data quality, lineage, and governance: - Implement quality frameworks: freshness/completeness/validity checks, anomaly detection on key measures. - Establish lineage and metadata management; define how datasets are documented and discoverable. - Own data classification (PII/sensitive), retention policies, and secure access patterns (RBAC/ABAC). Tooling strategy (open-source preferred): - Evaluate and introduce fit-for-purpose tools in areas like: - Observability/monitoring - Data quality and testing - Lineage/catalog - Orchestration enhancements - Secrets management and policy enforcement - Make pragmatic build-vs-buy decisions with clear TCO and operational fit. Data modeling (added advantage): - Guide and review modeling patterns (dimensional/entity models) to ensure consistent, reusable datasets for reporting, analytics and ML. How does success look like: - A stable, scalable platform with clear architectural standards and high engineering quality. - Pipelines are reliable with defined SLAs/SLOs, strong observability, and reduced incident frequency. - CI/CD and Git-based SDLC are adopted; changes are predictable, versioned, and easy to roll back. - BI/ML/GenAI platform foundations are in place and are enabling faster delivery across teams. - Measurable cost/performance improvements (job runtimes, compute spend, data freshness reliability). - The 2 engineers operate with clarity, quality, and autonomy under your guidance. Technical Competencies: - 10+ years in data engineering / data platform / data architecture roles with hands-on delivery. - Proven ownership of end-to-end data platforms (lake/warehouse + orchestration + governance). - Experience leading small teams and driving engineering standards and change management. - Strong stakeholder management and ability to balance speed, quality, and control. Required technical skills:Mandatory: - Databricks on AWS platform understanding (workloads, jobs, cluster policies, Delta/Lakehouse concepts). - Strong Terraform (IaC) for cloud/platform infrastructure. - Containerization & runtime: Docker, Kubernetes (deployment patterns, environment management). - Orchestration: Airflow (DAG design, retries, backfills, SLAs). - Data transformation practices (dbt familiarity preferred; tool-agnostic standards accepted). - CI/CD implementation, Git workflows, branching/release strategy. - Strong understanding of data platform concerns: ingestion, streaming concepts, outbound patterns, quality, lineage, retention, and classification. - Security fundamentals: IAM/RBAC, secrets management, auditability, PII handling. Good to have: - Deep dbt experience (macros, tests, docs, environment promotion). - Lakeflow jobs experience / Databricks Workflows depth. - Experience with open-source tools in: - Data quality (e.g., Great Expectations / Soda) - Lineage/catalog (e.g., OpenLineage / DataHub / Amundsen) - Observability (e.g., Prometheus/Grafana stack) - Strong data modeling background (dimensional + metrics layer thinking). - Experience with ML platform patterns and LLM/RAG platform guardrails. Qualification & Experience:- Graduation or Masters in Statistics, Mathematics, Computer Science or equivalent- 10+ years in data engineering / data platform / data architecture roles with hands-on delivery
Job Title
Data Architect