Skip to Main Content

Job Title


Data Scientist - Clinical Data Extraction & AI Integration


Company : Invent Health


Location : Chennai, Tamil Nadu


Created : 2025-07-26


Job Type : Full Time


Job Description

Data Scientist - Clinical Data Extraction & AI Integration   Experience Level: 3-6 Years  Location: Chennai/Hybrid  Employment Type: Full-time  About the Role   We are seeking an experienced Data Scientist to join our healthcare technology team, focusing on medical document processing and data extraction systems. You'll be working with cutting-edge AI technologies to build robust solutions that extract critical information from clinical documents, improving healthcare data workflows and patient care outcomes.  Key Responsibilities   Data Science & Analytics   Design and implement statistical models for medical data quality assessment  Develop predictive algorithms for encounter classification and validation  Build machine learning pipelines for document pattern recognition  Create data-driven insights from clinical document structures  Implement feature engineering for medical terminology extraction  Advanced Analytics & ML   Apply natural language processing (NLP) techniques to clinical text  Develop statistical validation frameworks for extracted medical data  Build anomaly detection systems for medical document processing  Create predictive models for discharge date estimation and encounter duration  Implement clustering algorithms for provider and encounter classification  AI & LLM Integration   Integrate and optimize Large Language Models  via AWS Bedrock and API services  Design and refine AI prompts for clinical content extraction with high accuracy  Implement fallback logic and error handling for AI-powered extraction systems  Develop pattern matching algorithms for medical terminology   Create validation layers for AI-extracted medical information  Healthcare Domain Expertise   Work with medical document structures   Implement healthcare-specific validation rules   Handle medical terminology extraction and clinical context analysis  Ensure HIPAA compliance and data security best practices  Technologies & Tools   Languages: Python 3.8+, R, SQL, JSON  Data Science Stack: pandas, numpy, scipy, scikit-learn, spaCy, NLTK  ML Frameworks: TensorFlow, PyTorch, transformers, huggingface  Visualization: matplotlib, seaborn, plotly, Tableau, PowerBI  AI Platforms: AWS Bedrock, Anthropic Claude, OpenAI APIs  Cloud Services: AWS (SageMaker, S3, Lambda, Bedrock)  Research Tools: Jupyter notebooks, Git, Docker, MLflow