Role: SRE ArchitectResponsibilities- Deploy, manage, optimize, and troubleshoot large-scale Kubernetes clusters in multi-cloud (AWS, GCP) and hybrid environments (OpenStack, VMware vSphere) - Implement GitOps workflows using ArgoCD for declarative Kubernetes deployments and continuous synchronization - Secure container runtime with pod security admission controls, network policies, and CIS benchmark enforcement - Implement cluster autoscaling and resource management strategies with tools such as Karpenter - Architect, implement, and manage infrastructure in multi-cloud (AWS, GCP) and hybrid environments following security-by-design principles - Implement cloud security posture management (CSPM) using AWS Security Hub and GCP Security Command Center - Optimize cloud resource usage leveraging AWS Cost Explorer, Savings Plans, and similar tools on other cloud providers. - Develop and maintain comprehensive monitoring, logging, tracing, and alerting solutions using Prometheus, Grafana, CloudWatch, Datadog, or similar tools - Implement runtime security monitoring using Falco and AWS GuardDuty for anomaly detection - Conduct root cause analysis (RCA) and implement proactive improvements to maximize system uptime, reliability, and performance - Design, implement, and maintain robust CI/CD pipelines using ArgoCD, Jenkins, GitLab CI/CD, GitHub Actions, or Tekton - Integrate security scanning (SAST/DAST) and artifact signing (cosign/sigstore) into deployment workflows - Promote and implement DevSecOps best practices across teams to automate testing, security scanning, and deployments - Implement infrastructure entitlement management through automated IAM policy reviews and least-privilege enforcement - Manage secrets securely using Vault, AWS Secrets Manager, or similar tools with automated rotation policies - Ensure adherence to compliance standards (SOC2, FedRAMP) and regulatory requirements through policy-as-code (OPA/Rego) - Implement and enforce governance policies and frameworks to optimize infrastructure usage, reduce costs, and enhance operational efficiency - Regularly review and optimize cloud expenditure, performance, and scaling strategies - Collaborate closely with architects, developers, QA, product teams, and management stakeholders - Clearly communicate complex infrastructure concepts and strategies to diverse stakeholders.Required Skills- 10+ years of experience as a Site Reliability Engineer, DevOps Engineer, Platform Engineer, or similar role - Extensive expertise in Kubernetes, ArgoCD, container orchestration, and related ecosystem - Hands-on experience implementing Kubernetes security controls (PSP/PSA, OPA Gatekeeper, network policies) - Hands-on experience with cloud platforms (AWS, GCP), OpenStack, VMware vSphere, and hybrid environments - Proficiency in scripting and automation languages (Python, Bash, Go, or similar) - Solid experience with infrastructure as code (Terraform, CloudFormation, Pulumi) - Strong knowledge of CI/CD tools and pipeline design (ArgoCD, Jenkins, GitLab CI/CD, GitHub Actions, Tekton) - Experience with cloud-native security tools (AWS GuardDuty, GCP Security Command Center, Prisma Cloud) - Exceptional troubleshooting and problem-solving skills, coupled with a proactive and continuous learning mindsetCertifications/ Qualifications:- Certifications in Kubernetes (CKA/CKAD/CKS), AWS (Solutions Architect, DevOps Engineer, Security Specialty), or GCP / Graduate
Job Title
SRE Architect