Overview Huawei Canada has an immediate 12-month contract opening for a Machine Learning Software Engineer. About the team: The Software-Hardware System Optimization Lab continuously improves the power efficiency and performance of smartphone products through software-hardware systems optimization and architecture innovation. We track trends in cutting-edge technologies, building competitive strength in mobile AI, graphics, multimedia, and software architecture for mobile phone products. About the job The Software-Hardware System Optimization Lab focuses on optimization across software and hardware to improve efficiency and performance for mobile devices. Responsibilities Profile and optimize end-to-end ML workloads and kernels to improve latency, throughput, and efficiency across GPU/NPU/CPU. Identify bottlenecks (compute, memory, bandwidth) and land fixes: tiling, fusion, vectorization, quantization, mixed precision, layout changes. Build/extend tooling for benchmarking, tracing, and automated regression/perf testing. Collaborate with compiler/runtime teams to land graph- and kernel-level improvements. Apply ML/RL-based techniques (e.g., cost models, schedulers, autotuners) to search better execution plans. Translate promising research/prototypes into reliable, scalable production features and services. Qualifications Master or PhD degree in Computer Science or related fields. Solid experience in ML systems or performance engineering (industry, OSS, or research). Fluency in Python and C++. Hands-on with at least one compute stack: CUDA/ROCm, OpenCL, Metal/Vulkan compute, Triton, vendor or open source NPUs. Practical knowledge of PyTorch or TensorFlow/JAX and inference/training performance basics (mixed precision, graph optimizations, quantization). Ability to turn ambiguous performance problems into measurable, repeatable experiments. AI compiler exposure: TVM, IREE, XLA/MLIR, TensorRT, or similar. Profiling skills (Nsight, perf, VTune, CUPTI/ROCm tools) and comfort reading roofline/mmemory-hierarchy signals. Experience with kernel scheduling/auto-tuning (RL, Bayesian/EA search) and hardware counters. Background with custom accelerators/NPUs, DMA/tiling/SRAM management, or quantization (INT8/FP8). Contributions to relevant OSS (links welcome). Job details Seniority level: Mid-Senior level Employment type: Full-time Job function: Engineering and Information Technology Industries: Telecommunications Referrals increase your chances of interviewing at Huawei Canada. #J-18808-Ljbffr
Job Title
Machine Learning Software Engineer - GPU/NPU