Job Description

Data Engineer Location: Hyderabad / Pune / Chennai The primary responsibility is extracting, processing, and validating raw technical data from various manufacturer websites using Python and manual methods when necessary. Alongside this, you will design, build, and maintain robust, scalable database architectures optimized for massive concurrent usage (up to 100,000 users) and extendable to billions of records. A key focus is implementing a hybrid data architecture that efficiently handles both structured data (like CSV, Excel, SQL tables) and unstructured or semi-structured data (PDFs, images, graphs, text files). This approach will enable flexible ingestion, storage, and retrieval — similar to platforms like Databricks or Delta Lake — combining data lakes and data warehouses for optimized performance. Optimizing database I/O operations for fast data storage, retrieval, and analysis is critical for system responsiveness and scale. This combination of advanced data extraction and hybrid database engineering is essential to the project’s success. Key Responsibilities: Develop and maintain Python scripts using BeautifulSoup, Selenium, Scrapy for web scraping. Perform manual data extraction when automation is ineffective or impractical. Organize, clean, and validate large datasets into CSV/Excel files. Digitize graphical data such as performance curves using appropriate tools. Design, build, and maintain scalable hybrid data architectures that integrate: Structured data storage in relational databases (e.g., PostgreSQL). Semi-structured and unstructured data management in data lake technologies or NoSQL databases. Data pipelines supporting transformation, normalization, and integration (similar to Databricks’ Medallion architecture). Optimize database performance and I/O throughput for high concurrency and low latency. Leverage AI and automation tools to accelerate data extraction and processing, ensuring manual quality checks where needed. Collaborate on system design and data workflows; document code and maintain version control. Qualifications: Bachelor’s degree in Computer Science, Software Engineering, or related field from a reputed institute. Strong Python programming skills with experience in web scraping libraries. Solid understanding of relational databases (PostgreSQL) and NoSQL databases (e.g., DynamoDB) with scalable architecture design. Knowledge of hybrid data architectures that combine data lakes and warehouses for structured and unstructured data management. Experience or willingness to learn modern data platforms like Databricks, Delta Lake, or similar is a plus. Ability to optimize database I/O operations for concurrency, speed, and efficient data access. Comfortable with manual data extraction and detailed data entry. Detail-oriented, proactive, and able to work independently. Good communication skills and trustworthy with sensitive data. Experience or interest in AI/automation tools is a plus.

Job Title

Company : Mumba Technologies, Inc.

Location : Chennai, Tamil Nadu

Created : 2025-08-05

Job Type : Full Time