Published February 4, 2026

Knowledge Bank™: Closing the Gap Between Data-Driven AI and Tribal Knowledge

It goes without saying that for an AI agent to be highly accurate at incident response, it must deeply understand the system it operates within. This led us to invest early in a sophisticated AI and data platform—grounded in causal inference and reinforcement learning—to infer as much context as possible directly from telemetry. Our goal has been to deliver highly accurate, out-of-the-box insights with minimal setup, a strategy consistently validated by our customers.

However, we also recognize that some of the most critical context isn't found in data logs or metrics. It lives in the collective experience of engineering teams: the "tribal knowledge" that distinguishes a senior engineer's intuition from a novice's. This includes understanding the unwritten rules of an architecture, the business significance of a particular service, or the specific debugging workflow that has been refined over years of practice.

To bridge this gap, we are excited to share Knowledge Bank™, our feature that lets your team guide Traversal on the knowledge it can’t learn from data alone. See it in action today by booking a demo.

The Philosophy: Last-Mile Optimization, Not a Crutch

Our founding team's research spanning causal machine learning and reinforcement learning [1] has been motivated by a core insight: complex systems require human-in-the-loop solutions at just the right places, and nothing more. Purely algorithmic approaches become cost-prohibitive at scale, and there are failure modes that no amount of training data can anticipate. This is precisely why runbooks exist in the first place.

Traversal was never designed to replace experienced engineers. It was designed to accelerate them: to handle the tedious evidence gathering and pattern matching so that human experts can focus on judgment calls that require institutional knowledge.

Knowledge Bank™ formalizes this human-in-the-loop capability. It is not a substitute for our mature AI platform's autonomous capabilities, but a last-mile optimization layer that allows customers to refine and direct Traversal's search, aligning it with their unique operational context, team preferences, and established workflows. Now instead of users simply consuming Traversal’s output, they are active participants in its continuous improvement.

We've seen this pattern validated in developer tools like Claude Code and Cursor, where users can define rules and preferences that shape how the AI behaves in their specific environment.

Knowledge Bank™ goes significantly further to bring that capacity to reliability. Instead of shallow preference tuning, we allow teams to encode durable operational knowledge: how their systems behave under failure, which signals matter first, and which investigative paths are known dead ends. We're bringing user-guided AI beyond stylistic customization and into the core of reliability engineering, where context, causality, and institutional memory actually determine outcomes.

The Problems We're Solving

Knowledge Bank™ emerged from patterns we observed in customer feedback: Traversal was often technically correct but sometimes missed practical nuances that would make findings immediately actionable.

The Tribal Knowledge Gap

There is critical operational context embedded in how teams reason about their systems—the business importance of a particular cluster, or how tags map to service criticality. While these patterns often emerge from telemetry, they are already well understood by experienced engineers.

Knowledge Bank™ allows teams to make this understanding explicit, compressing years of implicit learning into immediate, actionable context. The result is not a smarter model in the abstract, but one that is grounded in the realities of how your organization defines impact and urgency.

Procedural Steering

Runbooks exist because production issues are resolved through heuristics, experience, and soft constraints, not deterministic logic. Out-of-the-box, Traversal is like a skilled SRE who has recently joined the company: capable of figuring things out independently. But like any skilled SRE, Traversal does its best work when it can leverage the tried-and-tested debugging procedures your team has already refined for your environment. Different teams have different definitions of a "good" investigation, and that context isn't always inferable from data alone.

We take pride in Traversal's ability to infer the right answer autonomously. But when customers already have well-documented runbooks or proven workflows that represent years of accumulated wisdom, rediscovering them from scratch is inefficient. Instead of forcing the AI to rediscover these pathways, we should readily accept this guidance when offered.

Knowledge Bank™ provides three distinct channels for capturing knowledge: manual input, in-session feedback, and automated learning.

1. Upload Runbooks and Input Custom Context

Your team has documentation that would help any investigation. Now you can give it directly to Traversal—upload a PDF, paste text, or write instructions in a simple form.

Each knowledge item has:

Title (required)
Instruction (required, free-form text)
Use when (optional, applicability hint)

Examples of what you might input:

"When you see alerts from the 'payments' service, always check the third-party payment processor status page first—outages are the most common cause."
"For database connection errors, our team prefers to start with the connection pool metrics dashboard before diving into application logs."
"Alerts from our staging environment can usually be deprioritized unless they correlate with an upcoming release."

In-Session Feedback (Explicit Teaching)

After any investigation, you can explicitly tell Traversal what it got right, what it missed, and what should happen differently next time. This feedback is intentional, user-directed, and specific to the investigation that just occurred, giving Traversal clear guidance on how to adjust future investigations.

This approach draws on principles from co-founder Raj’s research in budgeted experimental design [5], which focuses on maximizing learning when feedback opportunities are limited.

3. Automated Learning (Implicit Learning)

Traversal has always learned implicitly from how teams interact with it. Knowledge Bank™ makes that implicit learning visible and controllable.

When you redirect an investigation—such as moving from logs to traces or metrics—Traversal learns from those choices automatically, without requiring intentional feedback. After each investigation, a background process analyzes the conversation to extract customer-specific insights about your infrastructure, data patterns, and team preferences. Through each observation and insight, Traversal becomes more attuned to your team’s specific logic.

The system is tuned to extract customer-specific causal patterns—insights about your infrastructure that wouldn’t generalize to other environments. This draws on work by co-founders Anish and Raaz on learning distributions from sparse observations [8], allowing Traversal to retain what’s unique while filtering out generic truths.

For example, “Querying logs by request_id is effective for tracing requests through middleware services” may be worth remembering, while “tools work better with specific time ranges” is filtered out—it applies everywhere. These learned insights appear in the Knowledge Bank™, where you can review, edit, or remove them. Traversal identifies what’s worth remembering; you stay in control.

Scaling Knowledge Retrieval: A Research-Driven Approach

Surfacing the right piece of knowledge at the right time is a deceptively complex problem—one that connects directly to our team's research in causal inference and optimal decision-making under uncertainty.

The Combinatorial Challenge

With N incident types and p possible runbook steps, the number of potential combinations grows exponentially. Estimating these interactions directly becomes infeasible at scale; the same challenge applies to learning which knowledge items are relevant to which incidents.

This core challenge is illustrated in Anish’s NeurIPS 2023 paper Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions [3].

When we first built this system internally, we started with a straightforward approach: compare knowledge items pairwise to determine which are most relevant to a given query. This worked well at small scale, but the computational cost grows quadratically with the number of knowledge items, quickly becoming prohibitive as it expands.

Challenges at Enterprise Scale

For our enterprise customers, where one Knowledge Bank™ can grow into the hundreds or thousands items within, we needed something more sophisticated. As Raj observed in his work on scalable causal discovery: "Traditional methods often fail in modern applications, which exhibit a larger number of observed variables than data points" [6]. Several challenges emerge:

Scaling Challenge	Description
Context Window Limitations	Large language models have finite context. It is impossible to include thousands of knowledge items in a single prompt.
Performance Degradation	Processing enormous volumes of context for every query would be prohibitively slow, defeating the purpose of a real-time investigation tool.
The "Needle in a Haystack" Problem	A large, unfiltered set of knowledge items introduces noise. The agent can become confused by conflicting or irrelevant information.

Our Approach: Multi-Stage Retrieval

To address these challenges, we built a multi-stage retrieval architecture that draws on our co-founders’ work in causal matrix completion [4], synthetic interventions [1], and kernel thinning [7].

Rapid Candidate Selection. We use semantic search to identify a smaller subset of potentially relevant knowledge items, reducing the search space from thousands to a manageable set. This approach draws on Raaz’s work on kernel thinning [7], which studies how large distributions can be efficiently represented by smaller, highly informative subsets. We apply these principles when distilling large Knowledge Banks™ down to the handful of knowledge items most relevant to a given investigation.
Contextual Ranking. The candidate set passes through a ranking model that performs deeper analysis, considering the nuances of the current investigation to select the most relevant knowledge items.
Offline Optimization. Much of the computational work—embedding generation, similarity indexing—happens offline, ensuring real-time retrieval remains fast.

This architecture allows Traversal to deliver both high-fidelity context and real-time performance at enterprise scale.

Network-Aware Learning

Production systems are interconnected, and actions on one component often affect others. This reflects the network interference problem studied by Anish and Abhineet Agarwal [2].

When you guide Traversal about one service, that knowledge informs how it reasons about connected services. Our network-aware approach ensures that knowledge items propagate appropriately through your service topology.

When a knowledge item is used during an investigation, it is explicitly surfaced in the UI—transparency that is crucial for building trust and allowing users to understand exactly what influenced the AI's conclusions.

What's Next

We are already working on the next generation of capabilities for Knowledge Bank™:

Bulk upload — Upload hundreds of documents at once and automatically generate knowledge items, making it easy to onboard your entire runbook library
Enterprise governance — Advanced permissioning, approval workflows, and review cycles so organizations can control who can create, edit, and publish knowledge items
Pre-publish feedback — Before a knowledge item goes live, get AI-powered feedback on how it will likely affect accuracy and whether it conflicts with existing knowledge items
Knowledge effectiveness reporting — See how the knowledge items you've created have impacted investigation accuracy over time, with clear metrics on which ones are being used and how they're helping

Getting Started

Your infrastructure is unique. Your team's expertise is irreplaceable.

Access Knowledge Bank™ in the Settings section of your Traversal workspace and get started by uploading a knowledge item today. Book a demo now.

References

Note: These citations feature work by Traversal's co-founders: Anish Agarwal (CEO) [1-4, 8], Raj Agrawal (Chief Scientist) [5-6], and Raaz Dwivedi (CTO) [7-8], as well as AI Researcher Abhineet Agarwal [2].*

[1] Agarwal, A., Shah, D., & Shen, D. (2020). Synthetic Interventions. Operations Research. https://arxiv.org/abs/2006.07691

[2] Agarwal, A., Agarwal, A., Masoero, L., & Whitehouse, J. (2024). Multi-Armed Bandits with Network Interference. Advances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/2405.18621

[3] Agarwal, A., Agarwal, A., & Vijaykumar, S. (2023). Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions. Advances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/2303.14226

[4] Agarwal, A., Dahleh, M., Shah, D., & Shen, D. (2023). Causal Matrix Completion. Conference on Learning Theory (COLT). https://arxiv.org/abs/2109.15154

[5] Agrawal, R., Squires, C., Yang, K., Shanmugam, K., & Uhler, C. (2019). ABCD-Strategy: Budgeted Experimental Design for Targeted Causal Structure Discovery. International Conference on Artificial Intelligence and Statistics (AISTATS). https://arxiv.org/abs/1902.10347

[6] Agrawal, R., Uhler, C., & Broderick, T. (2018). Minimal I-MAP MCMC for Scalable Structure Discovery in Causal DAG Models. International Conference on Machine Learning (ICML). https://arxiv.org/abs/1803.05554

[7] Dwivedi, R. & Mackey, L. (2021). Kernel Thinning. Conference on Learning Theory (COLT). https://arxiv.org/abs/2105.05842

[8] Choi, K., Feitelberg, J., Chin, C., Agarwal, A., & Dwivedi, R. (2024). Learning Counterfactual Distributions via Kernel Nearest Neighbors. arXiv preprint.https://arxiv.org/abs/2410.13381

Eric Schwartz

Matt Schoenbauer

Olivier Gabison

Knowledge Bank™: Closing the Gap Between Data-Driven AI and Tribal Knowledge

Table of Contents

The Philosophy: Last-Mile Optimization, Not a Crutch

The Problems We're Solving

Scaling Knowledge Retrieval: A Research-Driven Approach

What's Next

Getting Started

References

More blog posts

Traversal Expands Executive Team with Six Senior Leaders Across Go-to-Market and Engineering

Traversal Expands Executive Team with Six Senior Leaders Across Go-to-Market and Engineering

Traversal Expands Executive Team with Six Senior Leaders Across Go-to-Market and Engineering

American Express Taps Traversal to Transform Site Reliability Engineering with AI

American Express Taps Traversal to Transform Site Reliability Engineering with AI

American Express Taps Traversal to Transform Site Reliability Engineering with AI

Questions?

Questions?

Questions?