Overview New Position: This position is open due to an existing vacancy to support our evolving business needs. Document understanding is a foundational intelligence layer that powers every major capability across our legal AI platformfrom search and information extraction to agentic reasoning in products like Westlaw, PracticalLaw, and CoCounsel. Youll build stateoftheart semantic chunking, document enrichment, and knowledge graph construction systems that serve as the cognitive foundation multiple product teams depend on, working across authoritative legal, tax and accounting content and extraordinarily diverse customer data. This is a rare opportunity to solve publishingquality research problems with immediate production impactyour innovations will directly shape how millions of legal professionals research, analyze, and reason over complex legal documents while advancing the capabilities that enable the next generation of intelligent legal AI agents. Responsibilities Design, build, test, and deploy endtoend AI solutions for complex document understanding tasks in the legal domain. Develop advanced models for semantic chunking of lengthy, nonuniformly structured legal documents with adjustable granularity levels for different use cases. Build document enrichment systems that classify documents according to legal and customerdefined taxonomies and extract rich metadata. Create LLMbased knowledge graph construction pipelines that extract and link heterogeneous legal knowledge including citations, entities, and legal concepts across diverse legal content. Develop scalable synthetic data generation systems to support model training, simulate complex legal research queries and generate hallucinationfree answers. Work in collaboration with engineering to ensure wellmanaged software delivery and reliability at scale. Develop comprehensive data and evaluation strategies for both componentlevel and endtoend quality, leveraging expert human annotation and synthetic data generation. Apply robust training and evaluation methodologies that balance model performance with latency requirements, particularly for SLMbased solutions. Apply knowledge distillation techniques to compress large models into efficient SLMs suitable for production deployment. Independently determine appropriate architectures for challenging document understanding problems including semantic chunking strategies that handle diverse document formats, preserve legal document structure, and adapt to different granularity needs; document classification approaches that work across varying legal taxonomies and generalize to customerdefined schemas; LLMbased knowledge extraction methods that handle challenges like citation recognition errors and contextual references; multidocument reasoning architectures for generating synthetic multihop queries that reflect complex legal research patterns. Balance accuracy, efficiency, and scalability while solving realworld challenges like handling diverse document formats and content types. Partner closely with Engineering and Product teams to translate complex legal document understanding challenges into scalable, productionready solutions. Engage stakeholders across multiple product lines to deeply understand use case requirements, shaping objectives that align document understanding capabilities with diverse business needs including nextgeneration search and deep legal research. Maintain scientific and technical expertise in one or more relevant areas as demonstrated through product deliverables, published research at top venues (e.g., ACL, EMNLP, ICLR, NeurIPS, SIGIR, KDD), and intellectual property. About You PhD in Computer Science, AI, NLP, or a related field, or a Masters with equivalent research/industry experience. 5+ years of handson experience building and deploying document understanding systems, information extraction pipelines, or knowledge graph construction using deep learning, LLMs, and NLP methods. Proven ability to translate complex document understanding problems into innovative AI applications that balance accuracy and efficiency. Professional experience scaling yourself and leading through others, in an applied research setting. Strong programming skills (e.g., Python) and experience with modern deep learning frameworks (e.g., PyTorch, Hugging Face Transformers, DeepSpeed). Publications at relevant venues such as ACL, EMNLP, ICLR, NeurIPS, SIGIR, KDD. Technical Qualifications Deep understanding of document understanding fundamentals: document layout analysis, semantic chunking approaches beyond fixedsize or paragraphbased methods, document classification handling hierarchical taxonomies, imbalanced multilabel classification, and adapting to domainspecific schemas. Expertise in knowledge extraction and knowledge graph construction: entity recognition and linking, relation extraction, citation parsing, and building graph representations from unstructured text. Expertise in LLMbased information extraction, fewshot and multitask learning, posttraining and knowledge distillation. Solid understanding of synthetic data generation techniques for NLP, including queryanswer generation with verification and scalable data augmentation for training specialized models. Solid understanding of efficiency optimization including knowledge distillation, model compression, and designing SLMbased solutions that balance performance with computational constraints. Solid understanding of DL/ML approaches used for NLP tasks. Experience designing annotation workflows, creating highquality labeled datasets with clear guidelines, and developing evaluation frameworks for document understanding tasks. Preferred Qualifications Prior work on legal document understanding, legal information extraction, knowledge representation including legal citations and legal domain concepts or legal AI applications. Prior work handling complex document structures common in legal documents: nonuniform formatting, nested hierarchies, crossreferences, and embedded elements. Experience with building systems that perform analysis, question answering or retrieval across large document collections. Experience with knowledge graph frameworks and methodologies for legal or enterprise applications. Understanding of RAG and agentic workflows for enterprise knowledge. Publications at relevant venues such as ACL, EMNLP, ICLR, NeurIPS, SIGIR, KDD. Experience working with AzureML or AWS SageMaker. Whats in it For You Hybrid Work Model: Flexible hybrid working environment (23 days a week in the office depending on the role). Flexibility & WorkLife Balance: Work from anywhere for up to 8 weeks per year. Career Development and Growth: Continuous learning and skill development programming. Industry Competitive Benefits: Flexible vacation, mental health days, Headspace app, retirement savings, tuition reimbursement, incentive programs, and wellbeing resources. Culture: Inclusion and belonging, flexibility, worklife balance, and core values. Social Impact: Two paid volunteer days annually and ESG initiatives. Making a RealWorld Impact: Helping customers pursue justice, truth, and transparency through trusted information. #LILP2 Compensation For eligible US locations, the base compensation range is $126,000USD $234,000USD. For Ontario, Canada, the range is $100,000CAD $145,000CAD. Additional components include annual bonus and comprehensive benefits. Equal Opportunity As a global business, we seek talented, qualified employees in all our operations around the world regardless of race, color, sex/gender, pregnancy, gender identity, and expression, national origin, religion, sexual orientation, disability, age, marital status, citizen status, veteran status, or any other protected classification. We are an Equal Employment Opportunity Employer providing a drugfree workplace. Disability Accommodations We make reasonable accommodations for applicants with disabilities, including veterans with disabilities. If you reside in the United States and require an accommodation in the recruiting process, contact our Human Resources Department at [email protected]. #J-18808-Ljbffr
Job Title
Senior Applied Scientist, NLP/GenAI