This role is delivered within secure environments. Candidates must be eligible for UK SC clearance (requiring 5 years UK residency).
Opening: Join the Mission
At ByDesign Secure, we believe that world-class security shouldn't be an afterthought—it should be the foundation. We are an independent, outputs-based consultancy dedicated to solving the most complex data assurance challenges in the UK public sector. Currently, we working on a landmark transformation of a cross-government secure IT system. This is an exciting opportunity to help architect a private cloud environment from the ground up and modernize the end-user services that power national decision-making. We don't believe in "billing by the hour" or rigid hierarchies; we are a lean, expert team focused on delivering high-impact technical outcomes. If you are a self-starter who thrives on autonomy and wants to see your engineering or architectural decisions shape the future of sovereign security, we want to talk to you.
About the Opportunity
- We are seeking a highly skilled Cloud DevOps Engineer to implement automated processes and reliable capabilities within a Google Distributed Cloud (GDC) environment.
- This role is designed for a DevOps professional with a strong GCP background (Other CSP experience also considered) and an active Professional Cloud DevOps Engineer certification, as these skills are essential for balancing delivery speed with the stability required for secure, air-gapped systems.
- You will be a key driver in optimising production systems for both performance and cost within our mission-critical government delivery squads.
What You’ll Be Doing
- Infrastructure as Code (IaC): Designing and managing automated infrastructure using tools such as Terraform, and GitLab workflows to ensure consistent environments.
- CI/CD Pipeline Architecture: Building and securing automated deployment pipelines for applications and infrastructure, utilising artifact management and defined approval flows.
- Site Reliability Engineering (SRE): Applying SRE practices by defining Service Level Indicators (SLIs)and Objectives (SLOs) to balance change velocity with service reliability.
What You’ll Bring
- Advanced Observability: Implementing comprehensive monitoring, logging, and distributed tracing to proactively identify and troubleshoot performance or latency issues.
- Deployment Strategies: Executing sophisticated deployment patterns—such as canary, blue/green, and rolling updates—to mitigate impact on users during service transitions.
- FinOps & Performance Optimization: Optimising resource utilisation and costs through effective capacity planning and the use of cost-optimisation recommendations.
- Demonstrable experience in managing the full systems development lifecycle using automated, cloud-native methodologies.
- Proficiency in managing complex containerized environments (e.g., GKE fleets) and securing the software supply chain through vulnerability scanning.
Bonus Points For
- Current, non-expired Professional Cloud DevOps Engineer certification.
- Experience with FinOps practices, including infrastructure cost planning and resource rightsizing.
- Background in leveraging AI-assisted operations for log interpretation and code assistance.
- Experience working in air-gapped or disconnected environments with little or no internet connectivity
Clearance Requirements:
- This role requires either an existing Security Clearance (SC level) or for one to be passed before commencement. There must be a willingness to undergo Developed Vetting (DV).
Work Location: Hybrid remote in London SW1A
Job Types: Temporary, Fixed term contract
Contract length: 12 months
Pay: £600.00 per day
Application question(s):
- Do you have the permanent right to work in the UK?
- Do you currently hold active UK SC clearance?
- Are you eligible and willing to undergo UK SC clearance for this role? (Applications without this cannot be considered)
- Does your delivery approach allow for on-site presence in London (SW1A) when required (typically around 2 days per week)?
- Do you have experience managing Infrastructure as Code (IaC) using Terraform and Helm for large-scale cloud deployments?
- Have you implemented Site Reliability Engineering (SRE) practices, such as defining SLIs, SLOs, and managing error budgets?
Work Location: Hybrid remote in London SW1A