About UsHG Insights is the global leader in technology intelligence, delivering actionable AI driven insights through advanced data science and scalable big data solutions. What You’ll Do:Design, build, and optimize large-scale distributed data pipelines for processing billions of unstructured documents using Databricks, Apache Spark, and cloud-native big data toolsArchitect and scale enterprise-grade big-data systems, including data lakes, ETL/ELT workflows, and syndication platforms for customer-facing Insights-as-a-Service (InaaS) products.Collaborate with product teams to develop features across databases, backend services, and frontend UIs that expose actionable intelligence from complex datasets.Implement cutting-edge solutions for data ingestion, transformation, and analytics using Hadoop/Spark ecosystems, Elasticsearch, and cloud services (AWS EC2, S3, EMR).Drive system reliability through automation, CI/CD pipelines (Docker, Kubernetes, Terraform), and infrastructure-as-code practices.What You’ll Be Responsible ForLeading the development of our Big Data Insights Platform, ensuring scalability, performance, and cost-efficiency across distributed systems.Mentoring engineers, conducting code reviews, and establishing best practices for Spark optimization, data modeling, and cluster resource management.Building & Troubleshooting complex data pipeline issues, including performance tuning of Spark jobs, query optimization, and data quality enforcement.Collaborating in agile workflows (daily stand-ups, sprint planning) to deliver features rapidly while maintaining system stability.Ensuring security and compliance across data workflows, including access controls, encryption, and governance policies.What You’ll NeedBS/MS/Ph.D. in Computer Science or related field, with 7+ years of experience building production-grade big data systems.Expertise in Scala/Java for Spark development, including optimization of batch/streaming jobs and debugging distributed workflows.Proven track record with:Databricks, Hadoop/Spark ecosystems, and SQL/NoSQL databases (MySQL, Elasticsearch).Cloud platforms (AWS EC2, S3, EMR) and infrastructure-as-code tools (Terraform, Kubernetes).RESTful APIs, microservices architectures, and CI/CD automation.Leadership experience as a technical lead, including mentoring engineers and driving architectural decisions.Strong understanding of agile practices, distributed computing principles, and data lake architectures.Airflow orchestration (DAGs, operators, sensors) and integration with Spark/Databricks7+ years of designing, modeling and building big data pipelines in an enterprise work setting.Nice-to-HavesExperience with machine learning pipelines (Spark MLlib, Databricks ML) for predictive analytics.Knowledge of data governance frameworks and compliance standards (GDPR, CCPA).Contributions to open-source big data projects or published technical blogs/papers.DevOps proficiency in monitoring tools (Prometheus, Grafana) and serverless architectures.
Job Title
Senior Software Data Engineer