Position OverviewWe are seeking an accomplished Senior Manager / Lead Data Scientist to lead a high-performing team of data scientists and engineers focused on clinical data standardization, ETL workflows, and regulatory-ready data products. This leadership role requires deep expertise in CDISC standards (SDTM, ADaM, TLFs), OMOP Common Data Model, and genomic variant data, combined with proven ability to guide technical teams, architect scalable ETL pipelines, and ensure regulatory compliance across real-world data (RWD), EHR systems, and clinical trial datasets.The ideal candidate will drive the strategic direction of our clinical data operations, mentor a diverse team of data professionals, and serve as the primary technical authority for OMOP/SDTM transformations and regulatory submissions to agencies such as the FDA and PMDA.Key ResponsibilitiesLeadership & Strategy· Lead, mentor, and develop a team of data scientists, data engineers, and analysts working on clinical data standardization and ETL workflows· Define and execute the technical roadmap for OMOP and CDISC-compliant data pipelines, ensuring alignment with business objectives and regulatory requirements· Foster a culture of technical excellence, continuous improvement, and collaborative problem-solving across multidisciplinary teams· Partner with senior leadership to shape data strategy for precision medicine, regulatory submissions, and real-world evidence generation· Drive adoption of best practices in metadata-driven automation, reproducible workflows, and quality assurance frameworks Technical Architecture & Delivery· Design and oversee end-to-end ETL architectures for converting heterogeneous clinical, EHR, and real-world data sources into OMOP CDM, SDTM, ADaM, and TLF formats· Establish and maintain production-grade pipelines using open-source workflow orchestration tools (Airflow, Prefect, Nextflow, Luigi) and proprietary systems (SAS DI, Informatica, cloud-native platforms)· Champion the use of OHDSI tools (WhiteRabbit, Rabbit-in-a-Hat, ETL-CDMBuilder, Achilles, DataQualityDashboard) for OMOP transformations and quality validation· Ensure adherence to CDISC 360 metadata standards, Define.xml generation, controlled terminology management, and SDTM/ADaM conformance· Implement robust data quality, validation, and reconciliation processes across all stages of ETL, leveraging Pinnacle 21 and custom QC frameworks Regulatory & Compliance· Serve as the subject matter expert for regulatory submission-ready datasets, ensuring timely and accurate delivery of SDTM/ADaM/TLFs to FDA, EMA, and PMDA· Collaborate with biostatistics, clinical operations, regulatory affairs, and quality assurance teams to meet submission timelines and compliance standards· Provide expert guidance on data privacy, security, and governance in alignment with HIPAA, GDPR, ICH GCP, and ISO 27001/27701 standards· Review and approve Define.xml, Reviewer's Guides, aCRFs, and other submission documentation for regulatory packages Genomic & Variant Data Specialization· Lead initiatives for curating, harmonizing, and annotating genomic variant datasets from public and proprietary sources (ClinVar, ClinGen, HGMD, CADD, gnomAD, dbSNP, COSMIC, refSeq, REVEL)· Oversee ETL pipelines for mapping VCF annotation files to OMOP genomic tables and CDISC submission formats· Ensure quality control of variant annotations, reference genome build consistency (GRCh37/38), and adherence to HGVS nomenclature· Stay current with emerging variant annotation standards, genomic data formats (VCF, BED, GFF), and translational research methodologies Stakeholder Engagement· Act as the primary liaison between technical teams, clinical operations, statistical programming, and external partners on data standards and interoperability· Translate complex technical challenges into business-friendly solutions and communicate risks, trade-offs, and opportunities to senior stakeholders· Represent the organization in industry forums, CDISC working groups, OHDSI community events, and regulatory interactions Required QualificationsEducation· Ph.D. in Bioinformatics, Health Informatics, Computational Biology, Genomics, Biomedical Engineering, Clinical Data Science, or related quantitative field· M.S. with exceptional leadership track record and 7+ years of relevant experience may be considered Experience· 7+ years of progressive experience in clinical data science, bioinformatics, or health data engineering roles· 3+ years in leadership or team lead capacity, managing cross-functional technical teams (data scientists, engineers, analysts)· Proven track record of delivering regulatory-ready SDTM/ADaM datasets for FDA/EMA/PMDA submissions· Deep hands-on experience with OMOP CDM and OHDSI ecosystem (WhiteRabbit, Rabbit-in-a-Hat, ETL-CDMBuilder, ATLAS, Achilles)· Extensive experience building and maintaining production ETL pipelines for clinical trials, RWD, EHR, and genomic data· Demonstrated expertise in CDISC standards (SDTM, ADaM) and associated documentation (Define.xml, Reviewer's Guides, aCRF) Technical Skills (Core)· Programming & Scripting: Expert-level proficiency in Python, R, SQL; strong working knowledge of SAS (Base, Macro, Studio)· ETL & Workflow Orchestration: Hands-on experience with Airflow, Nextflow, Prefect, Luigi, dbt, or equivalent platforms· Clinical Data Standards: OMOP CDM, CDISC SDTM, ADaM, controlled terminologies (MedDRA, SNOMED CT, LOINC, RxNorm, ICD-10)· OHDSI Tools: WhiteRabbit, Rabbit-in-a-Hat, ETL-CDMBuilder, ATLAS, Achilles, DataQualityDashboard· Genomic Data: VCF, BED, GFF formats; reference genomes (GRCh37/38); HGVS nomenclature; variant annotation databases· Data Quality & Validation: Pinnacle 21, custom QC frameworks, automated testing, Define.xml validation· Cloud & Databases: SQL (PostgreSQL, MySQL, SQL Server), cloud platforms (AWS, GCP, Azure), data warehousing concepts· Version Control & DevOps: Git/GitHub/GitLab, CI/CD pipelines, Docker, Kubernetes (basic understanding) Domain Knowledge· In-depth understanding of clinical trials, real-world evidence studies, precision medicine, and translational research· Knowledge of ontologies and controlled vocabularies (ClinVar terms, Sequence Ontology, HPO, OMIM)· Familiarity with cohort-building tools (ATLAS, i2b2, TriNetX) and EHR/claims data structures· Understanding of data harmonization, linkage, and interoperability across heterogeneous sources· Awareness of HL7 FHIR, DICOM, and other health data exchange standards Leadership & Soft Skills· Proven ability to lead, mentor, and develop technical teams, with emphasis on coaching junior and mid-level data scientists· Strong strategic thinking and ability to translate business needs into technical solutions· Excellent communication and presentation skills, with experience presenting to executive leadership and regulatory authorities· Collaborative mindset, capable of working across functions (clinical, biostatistics, IT, regulatory, quality)· Problem-solving mentality, detail-oriented, and committed to data integrity and quality excellence· Fluent in English; additional languages a plus Preferred Qualifications· Certifications: CDISC SDTM/ADaM training certification; HL7 FHIR Proficiency; AWS Certified Solutions Architect / GCP Professional Data Engineer / Azure Data Engineer Associate· Statistical Programming: Experience with SAS statistical procedures, double programming workflows, TLF shell development· NLP & AI: Exposure to natural language processing applications on clinical narratives, adverse event coding, or generative AI for SDTM/ADaM automation· Data Visualization: Proficiency in Tableau, Power BI, or custom dashboards (Plotly, Shiny) for stakeholder reporting KeywordsOMOP, SDTM, ADaM, TLFs, CDISC, OHDSI, ETL, EHR, RWD, Clinical Trials, Regulatory Submissions, FDA, PMDA, WhiteRabbit, Rabbit-in-a-Hat, Pinnacle 21, Genomic Variants, VCF, HGVS, ClinVar, Python, R, SAS, SQL, Airflow, Nextflow, Define.xml, Data Quality, Team Leadership, Bioinformatics, Precision Medicine
Job Title
Senior Manager / Lead Data Scientist (Clinical Data Standardization & ETL Operations)