Skip to Main Content

Job Title


Azure Data Ingestion Engineer


Company : Insight Global


Location : Patna, Bihar


Created : 2025-05-03


Job Type : Full Time


Job Description

In order to apply, you must have a 30 day or less notice period.Pay 25-35 lakh and is based on experienceMUST HAVES:7+ years of experience with Azure Data Services: Strong experience with Azure Data Factory (ADF) for orchestrating data pipelines. Hands-on experience with ADLS Gen 2, Databricks, and various data formats (e.g., Parquet, JSON, CSV). Solid understanding of Azure SQL Database, Azure Logic Apps, Azure Function Apps, and Azure Container Apps. 5+ years of experience in Python for data ingestion, automation, and transformation tasks. Data Ingestion Techniques: Solid understanding of relational and non-relational data models and their ingestion techniques. Experience working with file-based data ingestion, API-based data ingestion, and integrating data from various third-party systems. Plusses:Experience with version control systems like Git. Experience in other Azure services such as Azure Synapse Analytics and Azure Data Share. Familiarity with cloud security best practices and data privacy regulations. DAY 2 DAY:A fortune 100 organization is seeking a Data Ingestion Engineer that will sit fully remote in India. We are seeking an experienced and highly motivated Data Ingestion Engineer to join our dynamic team. The ideal candidate will have strong hands-on experience with Azure Data Factory (ADF), a deep understanding of relational and non-relational data ingestion techniques, and proficiency in Python programming. You will be responsible for designing and implementing scalable data ingestion solutions that interface with Azure Data Lake Storage Gen 2 (ADLS Gen 2), Databricks, and various other Azure ecosystem services. The Data Ingestion Engineer will work closely with stakeholders to gather data ingestion requirements, create modularized ingestion solutions, and define best practices to ensure efficient, robust, and scalable data pipelines. This role requires effective communication skills, ownership, and accountability for the delivery of high-quality data solutions. Key Responsibilities: Data Ingestion Strategy & Development: Design, develop, and deploy scalable and efficient data pipelines in Azure Data Factory (ADF) to move data from multiple sources (relational, non-relational, files, APIs, etc.) into Azure Data Lake Storage Gen 2 (ADLS Gen 2), Azure SQL Database, and other target systems. Implement ADF activities (copy, lookup, execute pipeline, etc.) to integrate data from on-premises and cloud-based systems. Build parameterized and reusable pipeline templates in ADF to standardize the data ingestion process, ensuring maintainability and scalability of ingestion workflows. Integrate custom data transformation activities within ADF pipelines, utilizing Python, Databricks, or Azure Functions when required. ADF Data Flows Design & Development: Leverage Azure Data Factory Data Flows for visually designing and orchestrating data transformation tasks, enabling complex ETL (Extract, Transform, Load) logic to process large datasets at scale. Design data flow transformations such as filtering, aggregation, joins, lookups, and sorting to process and transform data before loading it into target systems like ADLS Gen 2 or Azure SQL Database. Implement incremental loading strategies in Data Flows to ensure efficient and optimized data ingestion for large volumes of data while minimizing resource consumption. Develop reusable data flow components to streamline transformation processes, ensuring consistency and reducing development time for new data ingestion pipelines. Utilize debugging tools in Data Flows to troubleshoot, test, and optimize data transformations, ensuring accurate results and performance. ADF Orchestration & Automation: Use ADF triggers and scheduling to automate pipeline execution based on time or events, ensuring timely and efficient data ingestion. Configure ADF monitoring and alerting capabilities to proactively track pipeline performance, handle failures, and address issues in a timely manner. Implement ADF version control practices using Git to manage code changes, collaborate effectively with other team members, and ensure code integrity. Data Integration with Various Sources: Ingest data from diverse sources such as on-premise SQL Servers, REST APIs, cloud databases (e.g., Azure SQL Database, Cosmos DB), file-based systems (CSV, Parquet, JSON), and third-party services using ADF. Design and implement ADF linked services to securely connect to external data sources (databases, file systems, APIs, etc.). Develop and configure ADF datasets and dataflows to efficiently transform, clean, and load data into Azure Data Lake or other destinations. Pipeline Monitoring and Optimization: Continuously monitor and optimize ADF pipelines to ensure they run with high performance and minimal cost. Apply techniques like data partitioning, parallel processing, and incremental loading where appropriate. Implement data quality checks within the pipelines to ensure data integrity and handle data anomalies or errors in a systematic manner. Review pipeline execution logs and performance metrics regularly, and apply tuning recommendations to improve execution times and reduce operational costs. Collaboration and Communication: Work closely with business and technical stakeholders to capture and translate data ingestion requirements into ADF pipeline designs. Provide ADF-specific technical expertise to both internal and external teams, guiding them in the use of ADF for efficient and cost-effective data pipelines. Document ADF pipeline designs, error handling strategies, and best practices to ensure the team can maintain and scale the solutions. Conduct training sessions or knowledge transfer with junior engineers or other team members on ADF best practices and architecture. Security and Compliance: Ensure all data ingestion solutions built in ADF follow security and compliance guidelines, including encryption at rest and in transit, data masking, and identity and access management. Implement role-based access control (RBAC) and managed identities within ADF to manage access securely and reduce the risk of unauthorized access to sensitive data. Integration with Azure Ecosystem: Leverage other Azure services, such as Azure Logic Apps, Azure Function Apps, and Azure Databricks, to augment the capabilities of ADF pipelines, enabling more advanced data processing, event-driven workflows, and custom transformations. Incorporate Azure Key Vault to securely store and manage sensitive data (e.g., connection strings, credentials) used in ADF pipelines. Integrate ADF with Azure Data Lake Analytics, Synapse Analytics, or other data warehousing solutions for advanced querying and analytics after ingestion. Best Practices & Continuous Improvement: Develop and enforce best practices for building and maintaining ADF pipelines and data flows, ensuring the solutions are modular, reusable, and follow coding standards. Identify opportunities for pipeline automation to reduce manual intervention and improve operational efficiency. Regularly review and suggest new tools or services within the Azure ecosystem to enhance ADF pipeline performance and increase the overall efficiency of data ingestion workflows. Incident and Issue Management: Actively monitor the health of the data pipelines, swiftly addressing any failures, data quality issues, or performance bottlenecks. Troubleshoot ADF pipeline errors, including issues within Data Flows, and work with other teams to root-cause issues related to data availability, quality, or connectivity. Participate in post-mortem analysis for any major incidents, documenting lessons learned and implementing preventative measures for the future.