Scribie is an AI-powered, Human Verified audio and video transcription service, trusted globally since 2008. We specialize in delivering accurate and reliable transcription solutions by blending advanced AI technology with human expertise. Headquartered in the US, we operate with a hybrid model in our Bangalore office, combining the flexibility of remote work with the collaboration of in-person engagement. This approach offers our team both autonomy and growth opportunities in a dynamic and supportive environment.We’re building production-grade audio foundation models for high-stakes legal and enterprise transcription — real customer data, messy audio, real consequences.This is not a paper-only research role.You’ll own the full ML lifecycle:- Fine-tuning large audio / multimodal models using SFT, LoRA, and RL-based preference optimization (DPO / PPO / ORPO) - Beating strong baselines like Whisper-large, GPT-4o, Gemini, Claude on domain-specific data - Designing WER, diarization, and alignment-driven evaluation stacks - Taking models from research notebooks → production inference services - Running daily experiments that directly impact quality, cost, and customer satisfactionYou’ll be our first ML hire, with real ownership over the audio ML roadmap — not a side project, not a support role.Bangalore (primarily onsite)Compensation - ₹25L – ₹30LThis role is a great fit if you:- Have shipped fine-tuned ASR / LLM / multimodal models into production - Are comfortable running large training jobs and debugging failures - Care about real-world impact, not just benchmarksNot a fit if you’re looking for an academic or paper-only research role.Apply here Or DM me with your LinkedIn/GitHub and a short note on the coolest audio or LLM system you’ve shipped.If turning messy real-world audio into models that make humans 5–10× more efficient excites you — let’s talk.
Job Title
Scribie- Senior Applied ML Engineer – Audio and Foundation Models