Principal Reliability & Observability Engineer

Principal Reliability & Observability Engineer

Location:

Sydney 

Work Type:

Permanent

Industry:

Cloud & DevOps

Contact Name:

Sarah Kissane

Contact Phone:

02 9409 4717

Date Published:

16-Feb-2026

A high-growth, ASX-listed financial technology organisation is seeking a Principal Reliability & Observability Engineer to lead the evolution of enterprise reliability, observability and AIOps capability across mission-critical systems.
This is a strategic principal-level role focused on building scalable, resilient platforms and embedding intelligent operational visibility across distributed cloud environments. You will operate at the intersection of platform engineering, SRE, infrastructure and AI-driven operational intelligence, shaping standards, driving alignment and influencing reliability outcomes across the business.

Job Responsibilities

  • Design and implement enterprise-wide observability architecture across cloud and distributed systems
  • Establish best practice standards for metrics, logs, traces, dashboards and alerting
  • Define and monitor performance indicators including latency, traffic, errors and saturation
  • Drive AIOps capability including anomaly detection, predictive monitoring and intelligent incident correlation
  • Build and scale telemetry pipelines capable of handling high-volume, real-time data
  • Architect automation and auto-remediation workflows to reduce operational toil
  • Partner with Engineering, DevOps, SRE, Network and Product teams to uplift reliability maturity
  • Provide technical leadership and peer review across scalability, performance and resilience
This is a principal-level engineering position with real influence over system health, customer experience and platform stability.

Knowledge & Experience

  • 10+ years’ experience across software engineering, DevOps, SRE or platform operations
  • Proven experience designing and operating large-scale distributed systems
  • Strong experience with observability tooling and telemetry frameworks
  • Deep understanding of cloud infrastructure, Kubernetes, containerisation and microservices
  • Experience building scalable monitoring and metrics pipelines
  • Exposure to ML or AI-driven anomaly detection within operational environments
  • Experience implementing automation and self-healing workflows
  • Ability to influence senior stakeholders and articulate technical strategy in commercial terms
You are a systems thinker who designs for scale and reliability.
You see patterns in complexity.
You care about engineering excellence and measurable impact.

Why This Role?

This is an opportunity to shape reliability and observability strategy within a fast-moving financial technology environment serving customers at national scale. You will have visibility, autonomy and the platform to define how operational intelligence is embedded across the organisation. If you have built resilient distributed systems and are ready to lead observability strategy at principal level, this is a rare opportunity to make a genuine impact. If you are interested in this opportunity please apply now and quote #269818


Peoplebank and Leaders IT are committed to creating a diverse and inclusive workplace where everyone belongs. We welcome applications from people of all backgrounds, identities, and experiences. If you need adjustments to the recruitment process due to your circumstances, please let us know—we’re here to support you.
APPLY NOW

Share this job

Interested in this job?
Save Job
Create As Alert

Similar Jobs

SCHEMA MARKUP ( This text will only show on the editor. )