Role Overview LLM Serving Engineer (Cloud AI Engineering) Senior / Staff Engineer at Qualcomm Technologies, Inc. We are building a scalable LLM inference platform that spans from research to commercial deployment. The role spans the full product lifecycle and requires strategic thinking, strong execution, and excellent communication skills. Responsibilities Build a scalable LLM inference platform using techniques such as disaggregated serving, KVCache management, advanced parallelism, speculative algorithms, model optimization, and specialized kernels. Contribute to the development of LLM serving packages (e.g. vLLM, SGLang, TGI, TritonInference Server, Dynamo, LLMd). Collaborate closely with customers to drive solutions by working with internal compiler, firmware and platform teams. Drive efficient serving through smart autoscaling, load balancing, and routing. Engage with opensource serving communities to evolve the framework. Qualifications Handson experience with one or more LLM serving/orchestration packages (TritonInference Server, vLLM, SGLang, Ollama, llmd, KServe, LMCache, MoonCake). Deep understanding of foundational LLMs, VLMs, SLMs, and transformerbased architectures. Strong experience developing language models using PyTorch. Strong computer science fundamentals algorithms, data structures, parallel and distributed programming. Understanding of computer architecture, ML accelerators, inmemory processing and distributed systems. Strong Python development skills for largescale projects. Experience analyzing, profiling, and optimizing deep learning workloads. Proactive learning about the latest inference optimization techniques. Excellent communication and problemsolving skills in a fastpaced environment. MS in Computer Science, Machine Learning, Computer Engineering or Electrical Engineering. Bonus Skills Opensource contribution to any GenAI package. Experience architecting and developing largescale distributed systems. Highlevel kernel design experience (PyTorch, CUDA, Triton). Knowledge of torch.compile or torchDynamo. PhD in Computer Science, Computer Engineering or Machine Learning. Minimum Qualifications Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 4+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience. Master's degree in Computer Science, Engineering, Information Systems, or related field and 3+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience. PhD in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience. Benefits & Compensation Pay Range: $158,400.00 $237,600.00. We also offer a competitive annual discretionary bonus program and RSU grants. For full details, review our US benefits here . Equal Opportunity & Accessibility Qualcomm is an equal opportunity employer. We are committed to providing an accessible process for individuals with disabilities. For accommodations, contact or call our tollfree number. Qualified applicants will receive consideration for employment without regard to protected classification. Location Toronto, Ontario, Canada 3 weeks ago. #J-18808-Ljbffr
Job Title
LLM Serving Engineer (Cloud AI Engineering), Senior / Staff Engineer