Epam Systems are seeking a Site Reliability Engineer. Are you focused on continuous service and performance improvement around an ecosystem of Products & Applications? Are you able to evaluate performance of applications, proactively identify and resolve issues through increased observability, data insights and automation?
We are looking for someone who can collaborate with development; testing, application support and infrastructure teams to:
- drive operational support overhead to a minimum whilst maximizing stability and availability
- develop new tools to solve existing problems in the production environment
- ensure that release criteria are being met and the integrity of the production environment is maintained
- design, implement and automate processes to defend the security and auditability of key risk controls
- interpret and analyze metric data to proactively identify and address potential system issues
- reduce Toil (work that tends to be manual, repetitive, automatable, tactical & devoid of value)
You will be part of a central team of empowered SREs within the Technology Operations Center, focusing on cutting edge solutions to drive continuous performance improvements across the entire technology stack.