This is a strategic principal-level role focused on building scalable, resilient platforms and embedding intelligent operational visibility across distributed cloud environments. You will operate at the intersection of platform engineering, SRE, infrastructure and AI-driven operational intelligence, shaping standards, driving alignment and influencing reliability outcomes across the business.
Job Responsibilities
- Design and implement enterprise-wide observability architecture across cloud and distributed systems
- Establish best practice standards for metrics, logs, traces, dashboards and alerting
- Define and monitor performance indicators including latency, traffic, errors and saturation
- Drive AIOps capability including anomaly detection, predictive monitoring and intelligent incident correlation
- Build and scale telemetry pipelines capable of handling high-volume, real-time data
- Architect automation and auto-remediation workflows to reduce operational toil
- Partner with Engineering, DevOps, SRE, Network and Product teams to uplift reliability maturity
- Provide technical leadership and peer review across scalability, performance and resilience
Knowledge & Experience
- 10+ years’ experience across software engineering, DevOps, SRE or platform operations
- Proven experience designing and operating large-scale distributed systems
- Strong experience with observability tooling and telemetry frameworks
- Deep understanding of cloud infrastructure, Kubernetes, containerisation and microservices
- Experience building scalable monitoring and metrics pipelines
- Exposure to ML or AI-driven anomaly detection within operational environments
- Experience implementing automation and self-healing workflows
- Ability to influence senior stakeholders and articulate technical strategy in commercial terms
You see patterns in complexity.
You care about engineering excellence and measurable impact.
Why This Role?
This is an opportunity to shape reliability and observability strategy within a fast-moving financial technology environment serving customers at national scale. You will have visibility, autonomy and the platform to define how operational intelligence is embedded across the organisation. If you have built resilient distributed systems and are ready to lead observability strategy at principal level, this is a rare opportunity to make a genuine impact. If you are interested in this opportunity please apply now and quote #269818
Peoplebank and Leaders IT are committed to creating a diverse and inclusive workplace where everyone belongs. We welcome applications from people of all backgrounds, identities, and experiences. If you need adjustments to the recruitment process due to your circumstances, please let us know—we’re here to support you.












