Fortune 100 Financial Services Company Partners With Traversal to Transform Site Reliability Ops at Scale

Customers

Fortune 100 Financial Services Company

32%

Reduction in mean time to resolution (MTTR)

82%

Root Cause Analysis (RCA) accuracy

Table of Contents

No table of Contents Available

32%

Reduction in mean time to resolution (MTTR)

82%

Root Cause Analysis (RCA) accuracy

At a Glance

A Fortune 100 financial services company was looking to improve their technology incident response practice to enhance system availability and recovery times. This company partnered with Traversal to transform how they handle site reliability operations at scale, running a pilot to validate root cause analysis (RCA) capabilities across its applications. Within the defined scope where necessary integrations with observability systems are available, Traversal delivered strong results: 32% reduction in potential mean time to resolution (MTTR), 82% root cause analysis accuracy across in-scope applications, and the ability to autonomously trace root cause in minutes for incidents that demand extensive cross-team coordination and manual investigation.

This foundation enables a shift toward agentic incident response, where AI agents work autonomously alongside human engineers to handle investigation and remediation workflows, accelerating time to resolution and reducing the business impact of downtime.

The Challenge

This company operates a technology infrastructure that mirrors its position as a global financial services leader, generating significant data volumes across thousands of applications and processing billions of transactions daily. Given the scale and complex environment, any operational incident requires multiple different teams to engage and can impact MTTR.

The applications span deployment models from Kubernetes clusters and Lambda functions to mainframe systems and traditional on-premises infrastructure and create a heterogeneous environment where no single monitoring approach works universally. This set up increases the operational cost of managing high-priority alerts.

The challenge wasn’t getting more data, as this company had invested heavily in observability infrastructure. It was whether existing data could be harnessed quickly and effectively to anticipate, resolve, and even prevent major incidents, while also enabling incident response to scale.

These technical challenges are typical of enterprise-scale operations. Manual triaging of alerts becomes an engineering bottleneck that exceeds human capacity, particularly as alerts require domain expertise and tribal knowledge that is not always easily accessible. Bridged incidents require coordinating many engineers; manual resolution of the incidents across heterogeneous systems leads to longer resolution time. Further, this company’s interconnected architecture means failures in one service can manifest as symptoms in completely different domains, making RCA difficult. This creates opportunities for AI SRE solutions like Traversal to improve efficiency and reduce resolution time for bridged incidents.

Our Deployment

This company took a phased, security-first approach to deploying Traversal, with clear evaluation criteria at each stage.

Traversal's on-premise deployment enabled this company to maintain complete control over sensitive financial data without requiring new data pipelines. The observability landscape was equally complex, with multiple third-party platforms in addition to proprietary in-house tools. Traversal’s technology was integrated into frontline operations in under 6 months, given the on-premise deployment and senior executive sponsorship.

The pilot phase focused on a large and complex set of applications where incident response was most challenging. Traversal demonstrated consistent performance across RCA and potential investigation time reduction, leading to strong organic adoption from its engineers. The success of the pilot resulted in an expanded deployment across this company’s largest applications, with the potential to make Traversal an important component of this company’s frontline operations and daily workflows.

‍

Traversal’s Impact at the F100 Company

Traversal has the potential to enable a fundamental change in how this company's engineers respond to both alerts and major incidents across their infrastructure. Instead of the company's engineers responding to incidents across their infrastructure manually, Traversal will complete comprehensive RCA in minutes, ingesting 250 billion logs of interest every day. Traversal also accurately identified root causes 82% of the time for in-scope applications and integrations, leading to a MTTR reduction of 32%. In the remaining cases, Traversal helps narrow the investigation scope and identifies which teams need to be involved, reducing the number of engineers that need to be engaged to support the response.

Traversal realizes value for customers from compounding factors across shorter incident duration and prevented escalations, with the potential to unlock a fundamental architectural shift in how enterprises maintain and scale their operations.

This foundation supports a shift toward agentic incident response, where AI agents work autonomously in parallel with human engineers to handle the bulk of investigation and remediation workflows, accelerating time to resolution, reducing the business impact of production incidents and downtime, and allowing engineers to focus on building vs. fixing.

To learn how Traversal can support agentic incident response in your environment, book a demo today.

By Traversal

Customer Stories

Some similar reads

All Customer Stories

CASE STUDY

Leading Global Crypto Exchange Selects Traversal to Improve Incident Response, Reduce MTTR, and Democratize Engineering Knowledge

Customer Story

CASE STUDY

Cloudways Launches Self-Healing Site Reliability Solution, Powered by Traversal

Customer Story

CASE STUDY

DigitalOcean Uses Traversal to Improve Infrastructure Resilience Across its Complex Enterprise Environment

Customer Story

CASE STUDY

At a Glance

The Challenge

Our Deployment

Traversal’s Impact at the F100 Company

Some similar reads

Leading Global Crypto Exchange Selects Traversal to Improve Incident Response, Reduce MTTR, and Democratize Engineering Knowledge

Cloudways Launches Self-Healing Site Reliability Solution, Powered by Traversal

DigitalOcean Uses Traversal to Improve Infrastructure Resilience Across its Complex Enterprise Environment

Ready to put AI to work?