Cerebras Systems builds the world''''s largest AI chip, 56 times larger than GPUs. Our novel waferscale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industryleading training and inference speeds and empowers machine learning users to effortlessly run largescale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Thanks to the groundbreaking waferscale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPUbased hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking realtime iteration and increasing intelligence via additional agentic computation. Responsibilities Lead the design and implementation of systemlevel debugging, validation, and observability platforms. Develop automated systems for collecting and analyzing numerical, and execution anomalies. Create visualization and analysis tools to enable efficient rootcause investigation. Build frameworks for failure classification, regression detection, and anomaly monitoring. Extend compilers, runtimes, and programming interfaces to support advanced profiling and instrumentation. Improve system bringup, lowlevel debug, and validation workflows. Partner crossfunctionally with compiler, hardware, firmware, runtime, and infrastructure teams. Establish best practices for debuggability, reliability, and operational excellence. Lead highimpact initiatives. Support incident response and drive longterm corrective actions. Qualifications Strong proficiency in C++ and Python, with a track record of building reliable, highperformance systems and tooling. Demonstrated experience debugging complex hardware/software systems and driving issues to root cause. Experience analyzing systemlevel data structures, execution graphs, or dependency networks for diagnostics and validation. Proven ability to design and build intuitive visualization and analysis tools for complex technical data. Experience with compiler internals, custom hardware interfaces, or lowlevel protocol design. Strong written and verbal communication skills, with the ability to explain technical concepts to diverse stakeholders. Ability to work independently and lead complex technical projects endtoend. Preferred Skills & Qualifications Familiarity with machine learning training and inference pipelines, especially distributed training and largemodel scaling. Prior work on highperformance clusters, HPC systems, or custom hardware/software codesign. Why Join Cerebras Build a breakthrough AI platform beyond the constraints of the GPU. Publish and open source their cuttingedge AI research. Work on one of the fastest AI supercomputers in the world. Enjoy job stability with startup vitality. Our simple, noncorporate work culture that respects individual beliefs. Apply today and become part of the forefront of groundbreaking advancements in AI! Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them. This website or its thirdparty tools process personal data. For more details, click here to review our CCPA disclosure notice. #J-18808-Ljbffr
Job Title
ML Software Tool Development Engineer