Job Description

Senior Machine Learning Engineer - Evaluations (Design Generation)Join to apply for the Senior Machine Learning Engineer - Evaluations (Design Generation) role at CanvaCompany DescriptionCanva''s Design Generation systems use AI to create complete designs from text descriptions - turning user intent into layouts, images, typography, and color palettes. At scale, we generate millions of designs monthly, making reliable quality evaluation critical. The Design Generation Platform team (8 engineers) supports Design Generation infrastructure with focus on developer experience tooling, platform orchestration, and selfservice capabilities. We own the plumbing that makes Design Generation systems observable, debuggable, and improvable.About the RoleAs the Design Generation Evaluation owner, you''ll build the infrastructure that enables quality monitoring across Design Generation. You''ll guide evaluation strategy, build scalable infrastructure, and integrate evaluation tools for a coherent system.What You''ll DoUnderstand and optimize existing evaluation systemsincluding LLM-as-Judge frameworks, visual quality models, and multidimensional scoring approachesanalyzing strengths, limitations, and tradeoffs to identify gaps.Design and implement automated evaluation pipelines that score generated designs at scale, balancing accuracy with computational cost.Define evaluation strategies for predeployment validation, continuous monitoring, A/B experiment analysis, and model comparison.Curate highquality evaluation datasets and benchmark suites representing diverse use cases, edge cases, and quality dimensions.Integrate evaluation systems into continuous deployment pipelines, creating automated quality gates that catch regressions before production.Reduce evaluation cycle time to enable teams to iterate faster on model improvements and launch experiments earlier.Partner with research teams to understand evaluation needs for new model architectures and capabilities.Define the evaluation ecosystem strategy: how different evaluation tools and methods compose together for Design Generation.Guide teams on evaluation best practices, appropriate methodologies for their use cases, and interpretation of results.What We''re Looking ForStrong ML engineering fundamentals with experience building and maintaining production ML systems at scale.Proven ability to build robust, scalable infrastructureplatform engineering with ML focus.Deep understanding of distributed systems, observability patterns, and monitoring best practices.Python proficiency with productionquality coding standards, code reviews, and testing practices.Experience with data pipelines, timeseries data, and statistical analysis for detecting anomalies.SQL fluency for querying and analyzing large datasets across data warehouse and analytics systems.Track record of building selfservice platforms or developer tooling that gets adoption.Excellent collaboration skillscrossteam communication and solution delivery.Evaluation of Gen AI systems at scale (even better if that''s evaluation of systems with creative outputs!).Additional InformationWe encourage individuals from diverse backgrounds to applypassion, curiosity, and willingness to learn matter.BenefitsEquity packages.Inclusive parental leave policy.An annual Vibe & Thrive allowance.Flexible leave options.Seniority levelMidSenior levelEmployment typeFulltimeJob functionInformation Technology#J-18808-Ljbffr

Job Title

Company : Canva

Location : Sydney, New South Wales

Created : 2026-01-31

Job Type : Full Time