Site Reliability Engineer at EPAM Systems Ltd, London, 12 Months, £Contract Rate

  • Contract Spy
  • London, UK
  • Oct 11, 2021
6 Months or more

Contract Description

Epam Systems are seeking a Site Reliability Engineer. Are you focused on continuous service and performance improvement around an ecosystem of Products & Applications? Are you able to evaluate performance of applications, proactively identify and resolve issues through increased observability, data insights and automation?

We are looking for someone who can collaborate with development; testing, application support and infrastructure teams to:
- drive operational support overhead to a minimum whilst maximizing stability and availability
- develop new tools to solve existing problems in the production environment
- ensure that release criteria are being met and the integrity of the production environment is maintained
- design, implement and automate processes to defend the security and auditability of key risk controls
- interpret and analyze metric data to proactively identify and address potential system issues
- reduce Toil (work that tends to be manual, repetitive, automatable, tactical & devoid of value)

You will be part of a central team of empowered SREs within the Technology Operations Center, focusing on cutting edge solutions to drive continuous performance improvements across the entire technology stack.

REQUIREMENTS

  • Experience and skills:
    • Previous development experience with a high level of Scripting or programming capability
    • In depth understanding of the SDLC including DevOps principles and culture
    • Expertise in enterprise monitoring, event management, robotics and automation
    • In depth understanding of complex enterprises including Cloud native, Hybrid and On-Prem environments
    • Knowledge or experience of Error Budgets, Service Level Indicators (SLI's) & Service Level Objectives (SLO's)
    • Technical understanding of cloud concepts (Azure, AWS), API frameworks & container technologies as well as
  • Coding and Scripting:
    • Strong collaborator who enjoys working in a team environment and has a passion for continuous learning
    • Comfortable to challenge when necessary to achieve business outcomes and value for our clients
    • Someone who practices site reliability engineering and through data insights improves continuously the availability, scalability, performance, security, and efficiency of services
    • Able to juggle multiple demands, deliverables, and priorities
    • Excellent communicator with skills to ensure a clear and concise dialogue with peers and stakeholders