Skip to Main Content

Job Title


Principal Architect - Large Model and Training System Performance Optimization


Company : Huawei


Location : Vancouver, Metro Vancouver Regional Distr


Created : 2025-12-11


Job Type : Full Time


Job Description

Huawei Canada has an immediate permanent opening for a Principal Architect About the team: The Computing Data Application Acceleration Lab aims to create a leading global data analytics platform organized into three specialized teams using innovative programming technologies. This team focuses on full-stack innovations, including software-hardware co-design and optimizing data efficiency at both the storage and runtime layers. This team also develops next-generation GPU architecture for gaming, cloud rendering, VR/AR, and Metaverse applications. One of the goals of this lab is to enhance algorithm performance and training efficiency across industries, fostering long-term competitiveness. About the job: - Lead the architecture design of Ascend training products, driving the continuous evolution of architectural competitiveness. - Analyze mainstream scenario requirements and industry technology trends for Ascend, introducing innovative technologies to ensure sustained leadership in architectural competitiveness. - Identify requirements for MindX, AI frameworks, acceleration libraries, and chip hardware, building a robust software-hardware architecture for Ascend training to achieve ongoing commercial success. - Collaborate with other departments/teams from Huaweis global research centers to align on strategic goals. - Spearhead project planning and define the technology/product development roadmap to guide long-term innovation. The base salary for this position ranges from $121,000 to $230,000 depending on education, experience and demonstrated expertise. About the ideal candidate: - Masters or PhD in Computer Science, Math/Statistics, with a focus on AI & Deep Learning. - 5+ years of experience in architecting large-scale AI training systems or similar complex software-hardware integrated solutions. - Excellent documentation skills for writing internal reports and/or publishing research papers. Effective communication skills for presentations to internal and external audiences. A proactive attitude with a strong ability to tackle challenges and adapt to evolving requirements and dynamic work environment. - Working knowledge of AI accelerators or full-stack AI acceleration systems and Deep Reinforcement Learning. - Hands-on experience with veRL or Ray for large-scale model training. - Familiarity with processor architectures and relevant work experience, with hands-on expertise in designing and developing complex system software architectures, and experience in performance optimization on GPU PU or similar hardware platforms. - Solid understanding of deep learning fundamentals, proficiency with the PyTorch framework, and practical experience in performance optimization using upper-layer distributed frameworks such as Megatron or DeepSpeed. - Strong programming skills with proficiency in C/C++ and Python. - Experience using performance analysis tools such as Nsight Systems, Nsight Compute, and DLProf. #J-18808-Ljbffr