BE/MTech in Computer Science or an equivalent professional experience9+ years of design, architecture, and development experience, tackling complex problems in large‑scale data pipelinesSolid foundation in Data Structures, Algorithms, Object-Oriented Programming, and Software DesignArchitectural expertise in data modeling for production‑grade batch and streaming processing systemsDeep understanding of Spark-based processing with focus on resource optimizationPractical understanding of Airflow for orchestration and Kafka for streamingSolid foundation in distributed systems: consistency, reliability, fault tolerance, retries, circuit breakers, and timeoutsProduction experience with CI/CD (e.g., GitHub Actions/Jenkins), containers (Docker), Kubernetes, and infrastructure-as-code (Helm/Terraform)Hands-on experience integrating LLM calls in data pipelines: prompt orchestration, batching, rate limiting, guardrails, output validationExposure to embedding generation and vector indexing as part of data processing pipelines.Programming experience in Python (Spark). Strong SQL and exposure to at least one cloudDevelop batch and streaming ETL/ELT pipelines across APIs, databases, files, and event streamsUse SQL and optimized Spark pipelines to transform raw data into clean, standardized, query-ready datasetsBuild reusable data marts and feature sets for downstream teams (analytics, ML, product)Tune queries, partitioning, clustering, indexing, and storage formats (Parquet/ORC)Optimize compute and storage costs; manage scaling strategies and right-size resourcesImplement CI/CD for data code and pipelines; manage environments and releasesTranslate business needs into technical specifications; document datasets, SLAs, and usage guidelinesSupport incident response and root-cause analysis for data quality issuesPartner with analytics, ML, engineering, and product teams to define data requirementsMentor junior engineers and contribute to engineering best practicesDrive architectural decisions and influence long-term data strategy
Job Title
Lead Big Data Engineer