Job Description

Job Title: DevOps Specialist & Data EngineerLocation: RemoteType: Full-timeExperience Level: SeniorIndustry: Generative AI / Artificial Intelligence / Machine LearningReports To: Head of Engineering / CTOAbout UsReady to join a cutting edge AI company? We’re on a mission to become the OpenAI of the spicy content industry, building a full-spectrum ecosystem of revolutionary AI infrastructure and products. Our platform, OhChat, features digital twins of real-world personalities and original AI characters, enabling users to interact with lifelike AI-generated characters through text, voice, and images, with a roadmap that includes agentic superModels, API integrations, and video capabilities.Role OverviewWe are looking for a Senior DevOps Specialist with a strong python and data engineering background to support our R&D and tech teams by designing, building, and maintaining robust infrastructure and data pipelines across AWS and GCP. You will be instrumental in ensuring our systems are scalable, observable, cost-effective, and secure. This role is hands-on, cross-functional, and central to our product and research success.Key ResponsibilitiesDevOps & InfrastructureDesign, implement, and maintain infrastructure on AWS and Google Cloud Platform (GCP) to support high-performance computing workloads and scalable services.Collaborate with R&D teams to provision and manage compute environments for model training and experimentation.Maintain / monitor systems, implement observability solutions (e.g., logging, metrics, tracing), and proactively resolve infrastructure issues.Manage CI/CD pipelines for rapid, reliable deployment of services and models.Ensure high availability, disaster recovery, and robust security practices across environments.Data EngineeringBuild and maintain data processing pipelines for model training, experimentation, and analytics.Work closely with machine learning engineers and researchers to understand data requirements and workflows.Design and implement solutions for data ingestion, transformation, and storage using tools such as Scrappy, Playwright, agentic workflows (e.g. crawl4ai) or equivalent.Optimize and benchmark AI training / inference / data workflows to ensure high performance, scalability, cost and an exceptional customer experience.Maintain data quality, lineage, and compliance across multiple environments.Key Requirements5+ years of experience in DevOps, Site Reliability Engineering, or Data Engineering roles.Deep expertise with AWS and GCP, including services like EC2, S3, Lambda, IAM, GKE, BigQuery, and more.Strong proficiency in infrastructure-as-code tools (e.g., Terraform, Pulumi, CloudFormation).Extensive hands-on experience with Docker, Kubernetes, and CI/CD tools such as GitHub Actions, Bitbucket Pipelines, or Jenkins, with a strong ability to optimize CI/CD workflows as well as AI training and inference pipelines for performance and reliability."Exceptional programming skills in Python. You are expected to write clean, efficient, and production-ready code. You should be highly proficient with modern Python programming paradigms and tooling.Proficiency in data-centric programming and scripting languages (e.g., Python, SQL, Bash).Proven experience designing and maintaining scalable ETL/ELT pipelines.Focused, sharp, and results-oriented: You are decisive, work with a high degree of autonomy, and consistently deliver high-quality results. You are quick to understand and solve the core of a problem and know how to summarize it efficiently for stakeholders.Effective communicator and concise in reporting: You should be able to communicate technical insights in a clear and actionable manner, both verbally and in written form. Your reports should be precise, insightful, and aligned with business objectives.Nice to HaveExperience supporting AI/ML model training infrastructure (e.g., GPU orchestration, model serving) for both Diffusion- and LLM pipelines.Familiarity with data lake architectures and tools like Delta Lake, LakeFS, or Databricks.Knowledge of security and compliance best practices (e.g., SOC2, ISO 27001).Exposure to MLOps platforms or frameworks (e.g., MLflow, Kubeflow, Vertex AI).What We OfferCompetitive salary + equityFlexible work environment and remote-friendly cultureOpportunities to work on cutting-edge AI/ML technologyFast-paced environment with high impact and visibilityProfessional growth support and resources

Job Title

Company : OhChat

Location : chelmsford, east anglia

Created : 2025-06-19

Job Type : Full Time