Overview We are seeking skilled Data Engineers to join our Data & Digital Twin Foundation team. You will design, build, and maintain data pipelines that power digital twin platforms, real-time operational systems, and AI/ML workloads. Working closely with data architects, simulation engineers, and ML teams, you will transform raw operational data into high-quality, governed datasets that drive intelligent decision-making. Our core data platform stack includes: Data Platform & Lakehouse Databricks (PySpark, Databricks SQL) for unified analytics and data engineering Delta Lake for ACID-compliant lakehouse architecture Unity Catalog for data governance, lineage, and access control Stream & Event Processing Apache Kafka for real-time event ingestion Structured Streaming for continuous data processing Delta Live Tables for declarative, quality-enforced pipelines Specialized Data Stores Neo4j for graph data modeling and network topology Python and SQL for data transformation Data Quality Delta Live Tables expectations for data validation Data profiling and anomaly detection Key Responsibilities Design, develop, and maintain scalable data pipelines using Databricks, PySpark, and Delta Lake Build real-time and batch data ingestion pipelines from diverse operational systems Implement data transformations that serve digital twin platforms and operational analytics Develop and maintain graph data models in Neo4j for network topology and relationship modeling Integrate Kafka event streams with Databricks for real-time operational state updates Implement data quality checks using Delta Live Tables expectations Ensure data governance compliance through Unity Catalog (lineage, access control, metadata) Optimize pipeline performance, reliability, and cost efficiency Write clean, well-documented, and testable code following engineering best practices Collaborate with ML engineers to deliver feature-engineered datasets Participate in code reviews, knowledge sharing, and continuous improvement initiatives Support production data systems through monitoring, troubleshooting, and incident resolution Preferred Qualifications 7+ years of hands-on data engineering experience Track record of building and maintaining production-grade data pipelines Experience with Delta Live Tables for declarative pipeline development Experience working in agile, cross-functional teams Familiarity with time-series data patterns and operational data modeling Highly Desirable Experience building data pipelines for digital twin or simulation platforms Familiarity with operational state modeling for real-time systems Exposure to physics-informed or time-series ML feature engineering Experience working with distributed, multidisciplinary teams Exposure to industrial domains such as Manufacturing, Logistics, or Transportation is a plus
Job Title
Data Engineer