About Artha Artha Group is a performance-first investment house managing ₹2,300 crores across domestic and international investment vehicles, including Category I & II AIFs, LLPs, and Private Limited companies. With active investments in 130+ startups, with 32+ successful exits, and 10+ renewable energy projects. We operate at the convergence of capital precision and operational depth.Our Technology Division is building the Unified Intelligence Platform (UIP) — an AI-first portfolio intelligence system powered by multi-agent orchestration, knowledge graphs, and large language models.Location: Mumbai / OnsiteEmployment Type: Internship (6 months)Reporting To: CTO, Artha GroupTeam: Technology Division – AI & Data ScienceExperience Level: Final-year student or recent graduates (0–1 year)Role Overview This is a hands-on data science internship focused on fine-tuning language models, building financial data pipelines, and supporting AI workflows for a production-grade intelligence platform. You will work directly with the CTO and the AI team, gaining exposure to real VC data, deal intelligence, and advanced ML systems.This is not a research-only role. You will be expected to ship working components, handle messy real-world data, and contribute to production workflows.You will Fine-tune small language models (SLMs) on proprietary VC and portfolio datasetsBuild and clean structured/unstructured financial data pipelinesDevelop embeddings for semantic search on deal memos and financialsSupport multi-agent AI workflows with ML componentsDesign evaluation frameworks for LLM outputs in financial contextsPerform exploratory data analysis (EDA) on portfolio metrics and market trendsEnrich knowledge graphs with ML-derived signalsKey ResponsibilitiesImplement LoRA/QLoRA fine-tuning workflows on HuggingFaceWork with SLMs (Phi-3, Mistral, Gemma, LLaMA) and understand tokenization, context windowsHandle financial datasets: P&L, balance sheets, MIS reports, time-series metricsBuild and maintain Python-based ML pipelines (NumPy, Pandas, Scikit-learn, PyTorch/TensorFlow)Integrate vector databases (ChromaDB, Qdrant) for semantic searchContribute to evaluation and monitoring of model performanceWhat Success Looks Like in 6 Months Delivered at least one fine-tuned model integrated into UIP workflowsBuilt robust data pipelines for financial datasetsDemonstrated ability to work independently on assigned ML tasksProduced clear documentation and reproducible experimentsReceived positive feedback from CTO and AI team on ownership and executionCandidate Profile Education: Final-year or recent graduate in CS, ECE, Statistics, Data Science, or MBA with strong quant backgroundExperience: 0–1 year; prior projects in NLP, ML, or financial data preferredMindset: Ownership-driven, curious, comfortable with ambiguity, strong execution disciplinePortfolio: GitHub repos, Kaggle notebooks, fine-tuning experiments, or research papers are a strong plusRequired Skills Strong foundations in statistics, probability, and ML theoryHands-on experience with fine-tuning language models (LoRA, PEFT)Proficiency in Python and ML stack (NumPy, Pandas, Scikit-learn, PyTorch/TensorFlow)Familiarity with vector databases and semantic searchUnderstanding of transformer architectures and attention mechanismsGood to Have: Exposure to VC/FinTech datasetsExperience with LangChain/LangGraph, Neo4j, or MLOps toolsKnowledge of RAG pipelines and LLM evaluation frameworksCompensation Structure Stipend: 25,000 per month, with the possibility of converting to a full-time positionDuration: 6 monthsStart Date: ImmediatePPO: High performers will be considered for a full-time roleWhat This Role Is NOT This is not a pure research internship — you will work on production-grade systemsThis is not a remote-only role — fulltime presence in Mumbai is expectedThis is not a short-term project — full 6-month commitment required
Job Title
Data Scientist Intern