Skip to Main Content

Job Title


Lead Data Engineer


Company : Verticalmove, Inc


Location : Pune, Maharashtra


Created : 2025-05-03


Job Type : Full Time


Job Description

PLEASE READ BEFORE APPLY:WE WILL ONLY CONSIDER CANDIDATES COMING FROM B2B SAAS OR CONSUMER INTERNET COMPANIES (THINK SIMILAR TO SALESFORCE, WORKDAY, INTUIT, ATLASSIAN, WALMART, AMAZON)WE WILL NOT CONSIDER ANY EXPERIENCE FROM IT OR DIGITAL TRANSFORMATION CONSULTANCIES (EXAMPLES OF THE WRONG COMPANIES WOULD BE TATA, INFOSYS, COGNIZANT, IBM GLOBAL SERVICES)THIS IS A HYBRID ROLE 3-DAYS A WEEK, NO REMOTE AT THIS TIMEWe are disrupting the supply chain planning industry with our AI-driven demand planning and inventory replenishment software. The total addressable market (TAM) for supply chain management (SCM) software is substantial and growing rapidly. In 2023, the global SCM software market was valued at approximately $28.9 billion and is projected to reach $45.2 billion by 2027, reflecting a compound annual growth rate (CAGR) of around 9.4% during the forecast period.We specifically target the rapidly growing segment of the supply chain management (SCM) industry: small and medium-sized businesses (SMBs). This focus has become even more pertinent in light of the supply chain disruptions caused by COVID-19. SMBs now have access to technological advantages previously exclusive to large enterprise companies like Walmart, Amazon, Lowe's, and Home Depot, who have invested hundreds of millions into these technologies.In this role, you will be responsible for designing, implementing, and deploying the next-generation platform tightly integrated with Amazon AWS services such as EMR, Athena, Glue, and Spark. You will develop an ultra-real-time, AI-driven demand planning engine to help Blue Ridge serve industries that manage perishable items, including food manufacturing, restaurants, grocery stores, pharmaceuticals, and more.If you are passionate about leveraging machine learning and AI technologies to solve complex supply chain challenges, this is the perfect opportunity for you.Fast Facts About Our Company:• Employee Count: 500+, 58 in Engineering (6 in the US and 52 in Pune India)• Customer Count: 200+• Revenue: $25.0M in ARR (and growing fast)• Profitable: YesJob Description:As a Lead Data Engineer, you will play a crucial role in building a scalable data platform that powers advanced supply chain solutions and next-generation AI applications. The role involves working on both legacy systems and modern cloud-based platforms to help scale the data infrastructure. You will collaborate with a team of data engineers and work closely with data scientists, machine learning engineers, and software developers to optimize data performance and build a platform that supports the adoption of Generative AI (GenAI) technologies. You will be responsible for building and optimizing distributed computing frameworks, involving functional business logic, and focusing on distributing jobs to the lowest unit possible to achieve infinite scale.Job Responsibilities:Design and build a scalable, fault-tolerant data platform optimized for distributed computing and large-scale data processing using AWS, Databricks, and Apache Spark.Implement data pipelines and ETL/ELT processes to efficiently ingest, transform, and load massive datasets from various sources.Leverage cloud data platforms to enable seamless data sharing, near-zero maintenance, and fast analytics on both structured and semi-structured data.Build and Optimize distributed computing jobs and queries to ensure maximum performance, cost efficiency, and scalability by distributing tasks to the lowest level possible.Collaborate with data scientists, machine learning engineers, and software developers to build solutions that power GenAI applications.Provide guidance on distributed computing architecture and mentor junior data engineers.Implement data governance, security, and compliance best practices.Drive innovation by partnering with global teams to enhance supply chain technology.Develop solutions that incorporate functional business logic into distributed computing frameworks, ensuring the scalability and efficiency of the data platform.Experience:Experience with reinforcement learning and optimization techniques for supply chain use cases.Familiarity with Generative AI (GenAI) and its application to predictive analytics and decision support.Experience with big data technologies such as Apache Spark and data orchestration tools like Apache Airflow.AWS, GCP, or Azure certifications (e.g., AWS Certified Machine Learning - Specialty).Required Experience:5+ years of experience as a Data Engineer, with strong expertise in big data technologies.7+ years of experience with cloud architecture, with extensive expertise in AWS.Strong proficiency in SQL, Python, and data modeling techniques.Deep knowledge of distributed computing principles and frameworks (e.g., Apache Spark, Apache Airflow), including experience with data streaming (Kafka).Hands-on experience with developing and deploying distributed computing applications using cloud-based platforms (e.g., AWS EMR, Azure HDInsight).Experience with cloud data platform architectures and best practices for ETL/ELT, data sharing, and query optimization.Experience in building and optimizing distributed computing frameworks to handle functional business logic and achieve scalability.Nice to have .NET code, project structure, and typical application development process/technologies.Excellent problem-solving and communication skills.AWS, GCP, or Azure certifications are highly valued (e.g., AWS Certified Solutions Architect, AWS Certified Big Data - Specialty).Required Education:Bachelor's or Master's degree in Data Science, Computer Science, Operations Research, Statistics, or a related field.