Principal Reliability & Observability Engineer

Location:

Sydney

Work Type:

Permanent

Industry:

Cloud & DevOps

Contact Name:

Sarah Kissane

Contact Phone:

02 9409 4717

Date Published:

16-Feb-2026

A high-growth, ASX-listed financial technology organisation is seeking a Principal Reliability & Observability Engineer to lead the evolution of enterprise reliability, observability and AIOps capability across mission-critical systems.
This is a strategic principal-level role focused on building scalable, resilient platforms and embedding intelligent operational visibility across distributed cloud environments. You will operate at the intersection of platform engineering, SRE, infrastructure and AI-driven operational intelligence, shaping standards, driving alignment and influencing reliability outcomes across the business.

Job Responsibilities

Design and implement enterprise-wide observability architecture across cloud and distributed systems
Establish best practice standards for metrics, logs, traces, dashboards and alerting
Define and monitor performance indicators including latency, traffic, errors and saturation
Drive AIOps capability including anomaly detection, predictive monitoring and intelligent incident correlation
Build and scale telemetry pipelines capable of handling high-volume, real-time data
Architect automation and auto-remediation workflows to reduce operational toil
Partner with Engineering, DevOps, SRE, Network and Product teams to uplift reliability maturity
Provide technical leadership and peer review across scalability, performance and resilience

This is a principal-level engineering position with real influence over system health, customer experience and platform stability.

Knowledge & Experience

10+ years’ experience across software engineering, DevOps, SRE or platform operations
Proven experience designing and operating large-scale distributed systems
Strong experience with observability tooling and telemetry frameworks
Deep understanding of cloud infrastructure, Kubernetes, containerisation and microservices
Experience building scalable monitoring and metrics pipelines
Exposure to ML or AI-driven anomaly detection within operational environments
Experience implementing automation and self-healing workflows
Ability to influence senior stakeholders and articulate technical strategy in commercial terms

You are a systems thinker who designs for scale and reliability.
You see patterns in complexity.
You care about engineering excellence and measurable impact.

Why This Role?
This is an opportunity to shape reliability and observability strategy within a fast-moving financial technology environment serving customers at national scale. You will have visibility, autonomy and the platform to define how operational intelligence is embedded across the organisation. If you have built resilient distributed systems and are ready to lead observability strategy at principal level, this is a rare opportunity to make a genuine impact. If you are interested in this opportunity please apply now and quote #269818

Peoplebank and Leaders IT are committed to creating a diverse and inclusive workplace where everyone belongs. We welcome applications from people of all backgrounds, identities, and experiences. If you need adjustments to the recruitment process due to your circumstances, please let us know—we’re here to support you.

Share this job

Principal Reliability & Observability Engineer

Similar Jobs

Register for job alerts.

ABOUT

EMPLOYERS

Job Seekers

Contractors

Insights

Register for job alerts.

ABOUT

EMPLOYERS

Job Seekers

Contractors

Insights

Register for job alerts.

ABOUT

EMPLOYERS

Job Seekers

Contractors

Insights