Experience:10+ yearsMinimum Required: Good verbal and written communication skills in English with a friendly and helpful attitude. Familiarity with ticket-based case management Ability to notify and escalate on back channels [L3 Support and developers] to local and remote management while staying engaged technically with the customer Ability to rapidly research through both internal and external knowledge base while maintaining engaged with the customer Configure and troubleshoot the issue with RAID configurations using tools like mdadm and smartctl Knowledge of PCI and PCIe and troubleshoot PCI issues using tools like lspci and lshw ● Hosting from the Service Processor from the hostusing tools like RKVM, IPMI tool ● Knowledge of booting a system in order to run a rescue process Experienced Linux system administration System performance monitoring using tools such as top, mem, strace, iostat, vmstat, htop, iotop Experienced network administration Should have understanding of the difference between containers and VMs Data Center Interaction: Configure and troubleshoot NFS storage, LDAP, DNS configurations ● Knowledge of standard networking protocols like Spanning Tree (STP, different types), LAG, VLAN (tagged vs untagged) Manage a ‘managed switch’, simple troubleshooting of port-down, Firewall and NAT knowledge, accessing ‘mgmt interface’ and ‘serial console’. Experienced in shell scripting Experienced in python scripting Experienced in troubleshooting client-side API issuesNice to Have: Strong understanding of ITIL Service Management Utilize and analyze output of tools such as eBPF, Linux perf Use GDB to analyze application and operating system core files Knowledge of Docker configuration, Kubernetes and Padman Demonstrated Experience with load balancers VM configuration and control systems (VMWare or similar) Juniper (JunOS) specific working knowledge VPN endpt to endpt knowledge [Setup, Troubleshooting, Handling pre-share key (sensitive) Python
Job Title
Site Reliability Engineer