Location: Noida, Full Time / On SiteAbout Us:We are an early stage SaaS / AI startup building building what we are calling Enterprise Revenue Fabric platform.Job Description:As a Data Engineer, you will play a critical role in building and maintaining the data infrastructure that powers our platform. You will work alongside a team of engineers, data scientists, and product managers to create scalable, real-time data pipelines, ensuring data governance, integrity, and compliance. You will help drive the evolution of our Cognitive Data Fabric and ensure that data is accessible, reliable, and ready for AI and analytics.Key Responsibilities:- Design, implement, and optimize real-time and batch data pipelines for data ingestion, transformation, and storage across multiple sources (CRM, billing, etc.). - Build and manage data governance frameworks, including Row-Level Security (RLS), data encryption, and auditability to ensure compliance with industry standards. - Work with AI and machine learning teams to integrate AI-driven models into data pipelines, ensuring that the platform remains AI-ready. - Develop and maintain schema registries, metadata management, and data catalogs using tools like OpenMetadata and Neo4j. - Monitor and ensure data quality across multiple modules and maintain real-time data freshness (≤1 hour) through automated data workflows. - Utilize Azure Fabric, Kafka, and Event Hubs for building scalable, high-performance data streams. - Collaborate with cross-functional teams to integrate data sources and ensure seamless data flow and governance across the platform. - Ensure data security and compliance, including PII masking, GDPR/DPDP compliance, and multi-tenancy isolation. - Develop tools to enable automated data pipelines using dbt, Airflow, and other DataOps tools. - Optimize data processes and workflows to improve performance, reduce latency, and minimize cost.Skills & Qualifications:- Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related field. - Proven experience in data engineering, with expertise in data pipeline architecture, ETL processes, and data governance. - Strong experience with cloud platforms (Azure preferred) and technologies like Azure Data Fabric, PostgreSQL, Kafka, Event Hubs, and Redis. - Proficiency in data modeling, schema management, and working with canonical schemas. - Solid understanding of AI/ML pipeline integration and working with tools like pgvector for semantic search and vector databases. - Experience with DataOps practices, CI/CD, and tools like Airflow, dbt, and Terraform for automation. - Hands-on experience in managing data security, including RLS, BYOK/CMK encryption, and compliance (e.g., GDPR, SOC2). - Familiarity with metadata management and tools like Neo4j or OpenMetadata. - Ability to work with large-scale distributed systems and handle high-volume, low-latency data workflows. - Strong programming skills in languages such as Python, SQL, and familiarity with frameworks like FastAPI or Node.js is a plus.
Job Title
Data Engineer – AI & Real-Time Data Infrastructure