Sony Research India is seeking a dynamic and motivated Speech Recognition Intern to join our innovative research team. As an intern, you will work on real-world problems in automatic speech recognition (ASR), focusing on improving noise robustness and reducing hallucinations in transcription outputs. You'll gain hands-on experience with state-of-the-art tools and datasets, and contribute to impactful projects alongside experienced researchers and engineers.Key Responsibilities:Explore and develop techniques to enhance ASR robustness under noisy, low-resource, and domain-shifted conditions.Investigate hallucination phenomena in end-to-end ASR models (e.g., Whisper, Wav2Vec2, etc.) and propose mitigation strategies.Conduct experiments using large-scale speech datasets and evaluate ASR performance across varying noise levels and linguistic diversity.Contribute to publications, technical reports, or open-source tools as outcomes of the research.Work Location:RemoteDuration of the paid Internship:This paid internship will be for a period of 6 months starting June first week of 2025.9:00 to 18:00 (Monday to Friday).Qualification:Currently pursuing/completed Masters in (Research) or Ph.D. in deep learning/machine learning with hands-on experience on Transformer models with an applications audio/speech.Must Have Skills:Strong programming skills in Python, and familiarity with PyTorch or TensorFlow.Experience with speech processing libraries (e.g., Torchaudio, ESPnet, Hugging Face Transformers).Prior experience with ASR models like Wav2Vec2, Whisper, or RNN-T is a plus.Ability to read and implement academic papers.Strong foundation in machine learning and signal processing.Good to have skills:Familiarity with prompt tuning, contrastive learning, or multi-modal architectures.Experience with evaluating hallucinations or generating synthetic speech/audio perturbations.
Job Title
Speech Recognition Intern