Company Description BugRaid.AI adopts advanced AIOPS and AI bots for proactive incident management and response, transforming the entirety of the process. By integrating sophisticated AIOPS for comprehensive incident analysis with AI bots for immediate response, BugRaid.AI provides automated and intelligent incident handling. Our platform enables organizations to swiftly identify and resolve issues, anticipate potential problems, and obtain valuable insights through detailed analytics. Designed for scalability and adaptability, our solution ensures improved operational efficiency and informed decision-making.Role Description This is a full-time, remote position for a Senior SRE Engineer. The Senior SRE Engineer's responsibilities include ensuring the reliability, availability, and performance of BugRaid.AI's platform. Daily responsibilities encompass site reliability engineering, troubleshooting, software development, system administration, and infrastructure management. The engineer will collaborate with cross-functional teams to implement best practices, automate processes, and uphold robust system operations.Responsibilities - Architect and sustain scalable infrastructure for hybrid cloud and on-premises deployments. - Implement internal observability for logs, metrics, traces, and deployment visibility. - Lead our incident management framework, including alerting, debugging, and post-incident analysis. - Design, develop, and enhance CI/CD pipelines supporting frequent and safe releases. - Manage Infrastructure as Code utilizing Terraform, Kubernetes, and Helm. - Drive infrastructure automation to bolster platform resiliency and uptime. - Collaborate closely with backend, AI, and product teams to ensure platform reliability.Requirements - Over 5 years of experience in SRE, DevOps, or infrastructure engineering. - Practical experience with AWS, Kubernetes, and Terraform. - Profound understanding of CI/CD pipelines and deployment strategies. - Familiarity with observability tools such as Prometheus, Grafana, OpenTelemetry, among others. - Extensive experience in debugging production systems and distributed infrastructure. - Proficiency in scripting and automation using Python, Go, or Bash. - Proven track record in operating highly available systems and incident response.Nice to Have - Experience with AI/ML infrastructure or GenAI-powered observability platforms. - Background in platform engineering or developer productivity tooling. - Exposure to agent-based architectures and real-time data workflows.What We Offer - Competitive salary aligned with market standards. - Equity through ESOPs with potential for long-term gains. - Remote-first working environment with hubs located in Hyderabad and Bangalore. - Opportunity to work directly with the founding team and influence the product roadmap. - High levels of ownership, rapid learning opportunities, and the chance to develop a global product from India.
Job Title
Senior SRE Engineer - BugRaid AI