Job Description

Job Description: We are looking for a Lead Generative AI Engineer with 3–5 years of experience to spearhead development of cutting-edge AI systems involving Large Language Models (LLMs) , Vision-Language Models (VLMs) , and Computer Vision (CV) . You will lead model development, fine-tuning, and optimization for text, image, and multi-modal use cases. This is a hands-on leadership role that requires a deep understanding of transformer architectures, generative model fine-tuning, prompt engineering, and deployment in production environments. Roles and Responsibilities: Lead the design, development, and fine-tuning of LLMs for tasks such as text generation, summarization, classification, Q&A, and dialogue systems. Develop and apply Vision-Language Models (VLMs) for tasks like image captioning, VQA, multi-modal retrieval, and grounding. Work on Computer Vision tasks including image generation, detection, segmentation, and manipulation using SOTA deep learning techniques. Leverage frameworks like Transformers, Diffusion Models, and CLIP to build and fine-tune multi-modal models. Fine-tune open-source LLMs and VLMs (e.g., LLaMA, Mistral, Gemma, Qwen, MiniGPT, Kosmos, etc.) using task-specific or domain-specific datasets. Design data pipelines , model training loops, and evaluation metrics for generative and multi-modal AI tasks. Optimize model performance for inference using techniques like quantization, LoRA, and efficient transformer variants. Collaborate cross-functionally with product, backend, and ML ops teams to ship models into production. Stay current with the latest research and incorporate emerging techniques into product pipelines. Requirements: Bachelor’s or Master’s degree in Computer Science, Artificial Intelligence, Machine Learning, or related field. 3–5 years of hands-on experience in building, training, and deploying deep learning models, especially in LLM, VLM , and/or CV domains. Strong proficiency with Python , PyTorch (or TensorFlow), and libraries like Hugging Face Transformers, OpenCV, Datasets, LangChain, etc. Deep understanding of transformer architecture , self-attention mechanisms , tokenization , embedding , and diffusion models . Experience with LoRA , PEFT , RLHF , prompt tuning , and transfer learning techniques. Experience with multi-modal datasets and fine-tuning vision-language models (e.g., BLIP, Flamingo, MiniGPT, Kosmos, etc.). Familiarity with MLOps tools , containerization (Docker), and model deployment workflows (e.g., Triton Inference Server, TorchServe). Strong problem-solving, architectural thinking, and team mentorship skills.

Job Title

Company : immerso.ai

Location : Chennai, Tamil Nadu

Created : 2025-08-05

Job Type : Full Time