Platform Engineer - Observability at Era4, United Kingdom, 6 Months initial, £Competitive Day Rate

  • Contract Spy
  • UK
  • Jun 10, 2026

Contract Description

Era4 develops, owns and operates AI infrastructure across the UK, powered by renewable energy. Converting legacy industrial and energy sites into modern data-centre facilities, Era4 is combining brownfield regeneration opportunities with cleaner, efficient, scalable compute capacity for healthcare, research, finance, enterprise, and public-sector organisations

 

Initial 6 month contract.

June start date.

Competitive day rate.

 

Role Summary:

The Identity & Platform Engineer is responsible for designing, implementing and operating the core platform services that provide:

Kubernetes platform services

Sovereign identity management

Federation and authentication services

Privileged access management

Secrets management

Customer identity integration

Platform security and governance

 

The successful candidate will play a key role in delivering a Zero Trust, sovereign cloud platform built around: FreeIPA, Teleport, authentic, Bitwarden, Kubernetes.

 

Key Responsibilities:

 

Observability Platform Implementation:

  • Deliver the implementation of Era4's observability platform based on Grafana Mimir, Loki, Tempo, Grafana Alloy and Grafana Enterprise tooling.
  • Design and implement highly available observability services across multiple co-location and production sites.
  • Configure telemetry ingestion pipelines for metrics, logs, and future distributed tracing workloads.
  • Develop and maintain observability architecture documentation, high-level designs, low-level designs, and operational runbooks.
  • Define platform standards for telemetry collection, labelling, metadata enrichment, retention policies, and data governance.
  • Implement multi-tenant observability controls and tenant isolation strategies.
  • Configure and maintain object-storage-backed telemetry platforms for long-term retention and scalability.

 

Telemetry Collection & Integration:

  • Deploy and manage Grafana Alloy collectors across Kubernetes clusters, Linux hosts, network infrastructure, storage platforms, and hardware management systems.
  • Integrate telemetry from Kubernetes, GPU infrastructure, HPE hardware, storage platforms, network devices, and cloud-native services.
  • Develop and maintain observability integrations using OpenTelemetry standards and protocols.
  • Establish onboarding processes for new platforms, applications, and infrastructure services.
  • Collaborate with application teams to define observability requirements and future tracing adoption strategies.

 

Alerting & Operational Insights:

  • Design and implement alerting frameworks using recording rules, AlertManager, and operational best practices.
  • Develop operational dashboards and service health views for infrastructure, platform, and application services.
  • Support integration of observability events with ITSM and incident-management platforms.
  • Define SLIs, SLOs, alert thresholds, and operational KPIs.
  • Continuously improve platform observability, incident detection, and root-cause analysis capabilities.

 

Reliability & Automation:

  • Implement Infrastructure-as-Code and GitOps practices for observability platform deployment and configuration management.
  • Develop automation for dashboard provisioning, alert deployment, tenant onboarding, and telemetry configuration.
  • Design and validate disaster recovery, resilience, and failover capabilities across observability services.
  • Contribute to platform security, compliance, and operational governance initiatives.
  • Work with operational teams to ensure observability services remain reliable, scalable, and maintainable.

 

Required Experience & Skills:

  • Significant experience implementing and operating enterprise observability or monitoring platforms.
  • Strong understanding of metrics, logs, traces, OpenTelemetry, and modern observability principles.
  • Experience with Grafana ecosystem technologies including Grafana, Prometheus, Grafana Mimir, Grafana Loki, Grafana Tempo, and Grafana Alloy.
  • Experience designing Kubernetes-native solutions and operating distributed platforms at scale.
  • Knowledge of Linux systems administration and cloud-native infrastructure.
  • Experience implementing Infrastructure-as-Code and GitOps approaches (preferably including Ansible).
  • Skilled in developing automation and operational tooling using Python and/or Go.
  • Previous exposure to creating technical architecture, operational documentation, and deployment designs.
  • Experience with object storage technologies and distributed data platforms.
  • Strong understanding of monitoring, alerting, and operational event management.

 

One or more of the following would be advantageous:

  • Implemented OpenTelemetry-based observability solutions.
  • Operated observability platforms in service-provider, cloud, or large-scale enterprise environments.
  • Supported GPU, AI/ML, or high-performance computing environments.
  • Integrated observability platforms with ITSM solutions.
  • Experience with multi-tenant platform architectures.
  • Knowledge of networking, storage, and data-centre infrastructure monitoring.
  • Understanding of distributed tracing and application performance monitoring.

 

Why Join Era4:

You’ll be joining a mission-driven start-up building critical national infrastructure, where operational excellence directly enables growth. This role offers high visibility with leadership, real autonomy, and the chance to shape how a next-generation company operates at scale.

 

Diversity & Inclusion:

Era4 is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.