Job Description

We are seeking an experienced ML Engineer to join our ML Platform Team. You will design, build, and optimize machine learning systems that power all four Zoé products, including LLM inference pipelines, RAG systems, speech-to-text/text-to-speech engines, and Arabic dialect models. This is a high-impact role where your work will directly serve millions of API calls across our enterprise AI platform.If you’ve ever wanted to make AI feel more human — this is that opportunity.Key Responsibilities Design and implement production-grade LLM inference pipelines using vLLM, TensorRT, or similar frameworksBuild and optimize Retrieval-Augmented Generation (RAG) systems with vector databases (Qdrant, Pinecone, Milvus)Develop and fine-tune Arabic language models for Emirati dialect speech recognition and NLP tasksImplement embedding pipelines for semantic search and document retrieval across productsOptimize model inference for latency and throughput on GPU clusters (NVIDIA A100/H100)Collaborate with product squads to integrate ML services into Call Center, Pulse, Cortex, and SparkImplement model evaluation frameworks, A/B testing, and continuous improvement processesStay current with latest advances in LLMs, speech AI, and NLP researchMust have QualificationsBachelor's or Master's degree in Computer Science, Machine Learning, or related field4+ years of experience in machine learning engineering or applied ML researchStrong proficiency in Python and ML frameworks (PyTorch, TensorFlow, Hugging Face Transformers)Experience deploying and optimizing LLMs in production environmentsHands-on experience with vector databases and semantic search systemsUnderstanding of transformer architectures, attention mechanisms, and modern NLP techniquesExperience with GPU computing and optimization (CUDA, TensorRT)Must have Technical SkillsLanguages: Python, C++ (for optimization), SQLML Frameworks: PyTorch, TensorFlow, Hugging Face, vLLM, TensorRTVector DBs: Qdrant, Pinecone, Milvus, pgvectorSpeech: Whisper, Pyannote, ElevenLabs, SonioxLLMs: Qwen, Llama, GPT-4, Claude, fine-tuning techniquesInfrastructure: Docker, Kubernetes, NVIDIA GPU clustersCloud: Azure ML, Azure OpenAI, AWS SageMakerGood-to-have QualificationsExperience with speech processing (ASR, TTS, speaker diarisation)Background in Arabic NLP or multilingual language modelsExperience with model quantization, pruning, and efficient inference techniquesFamiliarity with Azure OpenAI, Anthropic Claude, or similar LLM APIsPublications in ML/NLP conferences (NeurIPS, ICML, ACL, EMNLP)Experience with Kubernetes and containerized ML workloadsIdeal CandidateYou’re someone who can listen to a synthetic voice and tell exactly what’s missing — the warmth, the pause, the subtle inflection that makes it sound real.You love crafting conversations that connect with people, not just reply to them. You think in both Arabic and English, switching seamlessly between structure and emotion.You’re a bridge between language and technology, helping AI speak like a person — not a program.

Job Title

Company : The Future of Voice

Location : Thrissur, Kerala

Created : 2025-12-19

Job Type : Full Time