We are seeking aSenior Site Reliability Engineerto lead reliability efforts across our application stack, focusing on high availability, performance, and scalability. This role will own the health and uptime of ourmission-critical application ,Cloud infrastructure ,database system , andmonitoring infrastructure .About Us At BQE, our mission is to transform the operational landscape of professional services firms, empowering them to achieve more and serve their customers better. These firms play a crucial role in building infrastructure that significantly impacts global progress. BQE CORE serves as the operational backbone for these firms, providing an all-in-one SaaS solution. Our platform enables them to efficiently manage projects, improve budget tracking and profitability, and streamline processes through automation. With a robust customer base, we are on a trajectory of continuous growth, constantly innovating to meet the evolving needs of our customers and the industries they influence.Why Join Us Work with a modern tech stack in a high-impact reliability role. Be a key part of ourCloudOps and App Reliability strategy . A collaborative and supportive engineering culture.Responsibilities: Ensureapplication uptime , performance, and scalability. Ownincident management , including on-call rotations, root cause analysis, and incident reviews. Manage and monitorMS SQL Serverclusters and high-availability configurations. Set up and improve monitoring, alerting, and observability usingNew Relic, Logz.io, CloudWatch , and other tools. Proactively identify system bottlenecks and improve system reliability and automation. Define and improveSLOs/SLAsacross services. Drivedisaster recoverytesting and availability simulations. Collaborate with CloudOps and DevOps for infrastructure automation and enhancements. Work withJira and JSMto manage operational tasks, incidents, and changes.Qualifications & Experience: Bachelor’s degreein computer science, Engineering, or related field (or equivalent experience). 5-8 yearsof experience in Site Reliability Engineering, CloudOps, DevOps or related roles.Must Have Skills: Certifications inAWS, Microsoft, Windows, SQL Server, or SRE disciplines . Exposure toNew Relic APM, IaC automationis a plus. Experience working in a24x7 on-call rotation . Strong knowledge ofWindows OS eco-system ,IIS ,MS SQL Serveradministration, clustering, performance tuning, and failover. Deep experience with monitoring/logging tools likeNew Relic, Logz.io, AWS CloudWatch . Experience withAWS (EC2, ASG, CloudWatch, CloudTrail, VPC)and infrastructure management. Good understanding ofnetworking ,DNS ,load balancing , andsecurity principles . Proficient in scripting languages such asPowerShell, Python . Strong understanding ofincident response, change management, postmortem culture . Experience usingJira and Jira Service Managementfor operational workflows. Ability to work independently and drive technical initiatives.
Job Title
Senior Site Reliability Engineer