About Us Most AI is frozen in place - it doesn''''t adapt to the world. We think that''''s backwards. Our mandate is to build efficient intelligence that evolves in realtime. Our vision is AI systems that are flexible, personalized, and accessible to everyone. We believe efficiency is what makes this possible it''''s how we expand access and ensure innovation benefits the many, not the few. We believe in talent density: bringing together the best and most driven individuals to push the boundaries of continual adaptation. We''''re looking for builders and creative thinkers ready to shape the next era of intelligence. The Role You''''ll work directly with our founders to design and build the inference and optimization systems that power our core product. This role bridges research and production, combining deep exploration of inference techniques with handson ownership of scalable, highperformance serving infrastructure. You''''ll own the full lifecycle of LLM inferencefrom experimentation and performance analysis to deployment and iteration in productionthriving in a zerotoone environment and helping define the technical foundations of our inference stack. Responsibilities Inference Research & Systems: design and build our LLM inference stack from zero to one, exploring and implementing advanced techniques for lowlatency, highthroughput serving of language and multimodal models. Frameworks & Optimization: develop and optimize inference using modern frameworks (e.g., vLLM, SGLang, TensorRTLLM), experimenting with batching strategies, KVcache management, parallelism, and GPU utilization to push performance and cost efficiency. SoftwareHardware CoDesign: collaborate closely with founders and model developers to analyze bottlenecks across the stack, cooptimizing model execution, infrastructure, and deployment pipelines. Qualifications Strong experience building and optimizing LLM inference systems in production or research environments. Handson expertise with inference frameworks such as vLLM, SGLang, TensorRTLLM, or similar. Deep performance mindset with experience in GPUbacked systems, latency/throughput optimization, and resource efficiency. Solid understanding of transformer inference, serving architectures, and KVcachebased execution. Strong programming skills in Python; experience with CUDA, Triton, or C++ a plus. Comfort working in ambiguous, zerotoone environments and driving research ideas into production systems. Nice to have: experience with model quantization or pruning, speculative decoding, multimodal inference, opensource contributions, or prior work in systems or ML research labs. Above all, we''''re looking for great teammates who make work feel lighter and aren''''t afraid to go out on a limb with bold ideas. You don''''t need to be perfect, but you do need to be adaptable. We encourage you to apply, even if you don''''t check every box. Benefits Flexible work: Inperson collaboration in the Bay Area, a distributed globalfirst team, and quarterly offsites. Adaption Passport: Annual travel stipend to explore a country you''''ve never visited. We''''re building intelligence that evolves alongside you, so we encourage you to keep expanding your horizons. Lunch Stipend: Weekly meal allowance for takeout or grocery delivery. WellBeing: Comprehensive medical benefits and generous paid time off. #J-18808-Ljbffr
Job Title
AI Systems & Inference Frameworks Engineer