Role OverviewWe are seeking a skilled GenAI Engineer to develop, optimize, and deploy advanced LLMs, VLMs, and multimodal AI systems. You will work on fine-tuning foundation models, designing retrieval architectures, and building production-ready inference pipelines for scalable AI solutions.Key RolesDevelop and enhance LLMs, VLMs, RAG systems, and multimodal generation pipelines for production use cases.Understand business requirements and convert them into scalable, high-performance AI model architectures and workflows.Fine-tune and customize Transformer-based models using proprietary datasets, advanced training strategies, and evaluation frameworks.Optimize tokenization, embedding generation, vector search, and retrieval flows for high-throughput applications.Develop high-performance inference pipelines using ONNX, TensorRT, quantization, batching, streaming, and GPU/accelerator optimizations.Ensure all models are production-grade—robust, scalable, monitored, and integrated into backend systems.Research and evaluate cutting-edge architectures in multimodal models, generative AI, and retrieval-augmented techniques.ResponsibilitiesDesign end-to-end GenAI systems including training, fine-tuning, inference serving, and continuous model improvements.Work with backend teams to integrate models into scalable APIs using Triton, TensorRT, ONNX Runtime, vLLM, or custom inference engines.Build model evaluation pipelines—BLEU, ROUGE, alignment tests, hallucination checks, safety filters, and latency/throughput benchmarks.Experiment with new architectures (Mixture-of-Experts, diffusion-based multimodal, etc.) and contribute to LLM/VLM improvements.Collaborate with product, backend, ML, and DevOps teams to deliver end-to-end GenAI features.Maintain documentation, ensure reproducibility, and follow best practices in model governance, versioning, and monitoring.Required Qualifications4-6 years of experience in applied machine learning, deep learning, GenAI, or multimodal systemsProven expertise with Transformers, LLMs, VLMs, diffusion models, and retrieval-augmented systems.Hands-on experience with Python, PyTorch, TensorFlow, Hugging Face, LangChain, and modern training pipelines.Strong knowledge of vector databases (FAISS, Pinecone, Milvus, Chroma).Solid experience with ONNX, TensorRT, quantization, model optimization, and inference engines (vLLM, FasterTransformer, Triton).Understanding of distributed training, GPU utilization, mixed precision, and large-scale model serving.Strong problem-solving skills and ability to deliver production-quality AI systems.
Job Title
Gen AI Engineer II