Published February 2, 2026
Knowledge Bank: Closing the Gap Between Data-Driven AI and Tribal Knowledge
At Traversal, we believe that for an AI agent to be truly effective at incident response, it must deeply understand the system it operates within. That belief led us to invest early in a sophisticated AI and data platform—grounded in causal inference and reinforcement learning—to infer as much context as possible directly from telemetry. Our goal has been to deliver highly accurate, out-of-the-box insights with minimal setup, a strategy consistently validated by our customers.
However, we also recognize that some of the most critical context isn't found in data logs or metrics. It lives in the collective experience of engineering teams: the "tribal knowledge" that distinguishes a senior engineer's intuition from a novice's. This includes understanding the unwritten rules of an architecture, the business significance of a particular service, or the specific debugging workflow that has been refined over years of practice.
To bridge this gap, we are excited to share Knowledge Bank, our feature that lets your team guide Traversal on the knowledge it can’t learn from data alone.
The Philosophy: Last-Mile Optimization, Not a Crutch
Our founding team's research spanning causal inference and synthetic interventions [1] has always been motivated by a core insight: complex systems require human-in-the-loop solutions. Pure algorithmic approaches become cost-prohibitive at scale, and there are failure modes that no amount of training data can anticipate. This is precisely why runbooks exist in the first place.
Traversal was never designed to replace experienced engineers. It was designed to accelerate them: to handle the tedious evidence gathering and pattern matching so that human experts can focus on judgment calls that require institutional knowledge.
Knowledge Bank formalizes this human-in-the-loop capability. It is not a substitute for our mature AI platform's autonomous capabilities, but a last-mile optimization layer that allows customers to refine and direct Traversal's analytical depth, aligning it with their unique operational context, team preferences, and established workflows. This transforms the dynamic from one where users simply consume the AI's output to one where they are active participants in its continuous improvement.
We've seen this pattern validated in developer tools like Claude Code and Cursor, where users can define rules and preferences that shape how the AI behaves in their specific environment.
Knowledge Bank goes significantly further to bring that capacity to reliability. Instead of shallow preference tuning, we allow teams to encode durable operational knowledge: how their systems behave under failure, which signals matter first, and which investigative paths are known dead ends. We're bringing user-guided AI beyond stylistic customization and into the core of reliability engineering, where context, causality, and institutional memory actually determine outcomes.
The Problems We're Solving
Knowledge Bank emerged from patterns we observed in customer feedback: Traversal was often technically correct but sometimes missed practical nuances that would make findings immediately actionable.
The Tribal Knowledge Gap
There is critical operational context embedded in how teams reason about their systems—the business importance of a particular cluster, or how tags map to service criticality. While these patterns often emerge from telemetry, they are already well understood by experienced engineers.
Knowledge Bank allows teams to make this understanding explicit, compressing years of implicit learning into immediate, actionable context. The result is not a smarter model in the abstract, but one that is grounded in the realities of how your organization defines impact and urgency.
Procedural Steering
Building software systems involves creating hard connections between technologies and carefully constructing rule-based logic. In an ideal world, everything follows these perfectly set-out rules. But real-world debugging rarely follows a strict script. Runbooks exist because production issues are resolved through heuristics, experience, and soft constraints, not deterministic logic.
Out-of-the-box, Traversal is like a skilled SRE who has recently joined the company: capable of figuring things out independently. But like any skilled SRE, Traversal does its best work when it can leverage the tried-and-tested debugging procedures your team has already refined for your environment. Different teams have different definitions of a "good" investigation, and that context isn't always inferable from data alone.
Accepting Help
We take pride in Traversal's ability to infer the right answer autonomously. But when customers already have well-documented runbooks or proven workflows that represent years of accumulated wisdom, rediscovering them from scratch is inefficient. Instead of forcing the AI to rediscover these pathways, we should readily accept this guidance when offered.
How Knowledge Bank Works
Knowledge Bank provides three distinct channels for capturing knowledge: manual input, in-session feedback, and automated learning.
1. Upload Runbooks and Input Custom Context
Your team has documentation that would help any investigation. Now you can give it directly to Traversal—upload a PDF, paste text, or write instructions in a simple form.
Each knowledge item has:
Title (required)
Instruction (required, free-form text)
Use when (optional, applicability hint)
Examples of what you might input:
"When you see alerts from the 'payments' service, always check the third-party payment processor status page first—outages there are the most common cause."
"For database connection errors, our team prefers to start with the connection pool metrics dashboard before diving into application logs."
"Alerts from our staging environment can usually be deprioritized unless they correlate with an upcoming release."
2. In-Session Feedback
After any investigation, you can tell Traversal what it got right, what it missed, and what should happen differently next time. This feedback is actionable and specific to the investigation that just occurred, improving future investigations across your organization.
This approach draws on principles from co-founder Raj’s research in budgeted experimental design [5], which focuses on maximizing learning when feedback opportunities are limited.
3. Automated Learning
Traversal has always learned from how teams interact with it. Knowledge Bank makes that learning explicit and controllable.
When you redirect an investigation—“check the traces instead” or “look at the connection pool”—you’re implicitly teaching Traversal. After each investigation, a background process analyzes the conversation to extract customer-specific insights about your infrastructure, data patterns, and team preferences.
The system is tuned to extract customer-specific causal patterns—insights about your infrastructure that wouldn’t generalize to other environments. This draws on work by co-founders Anish and Raaz on learning distributions from sparse observations [8], allowing Traversal to retain what’s unique while filtering out generic truths.
For example, “Querying logs by request_id is effective for tracing requests through middleware services” may be worth remembering, while “tools work better with specific time ranges” is filtered out—it applies everywhere. These learned insights appear in the Knowledge Bank, where you can review, edit, or remove them. Traversal identifies what’s worth remembering; you stay in control.
Scaling Knowledge Retrieval: A Research-Driven Approach
Surfacing the right piece of knowledge at the right time is a deceptively complex problem—one that connects directly to our team's research in causal inference and optimal decision-making under uncertainty.
The Combinatorial Challenge
With N incident types and p possible runbook steps, the number of potential combinations grows exponentially. Estimating these interactions directly becomes infeasible at scale; the same challenge applies to learning which knowledge items are relevant to which incidents.
This core challenge is illustrated in Anish’s NeurIPS 2023 paper Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions [3].
When we first built this system internally, we started with a straightforward approach: compare knowledge items pairwise to determine which are most relevant to a given query. This worked well at small scale, but the computational cost grows quadratically with the number of knowledge items, quickly becoming prohibitive as it expands.
Challenges at Enterprise Scale
For our enterprise customers, where one Knowledge Bank can grow into the hundreds or thousands items within, we needed something more sophisticated. As Raj observed in his work on scalable causal discovery: "Traditional methods often fail in modern applications, which exhibit a larger number of observed variables than data points" [6]. Several challenges emerge:
Scaling Challenge | Description |
Context Window Limitations | Large language models have finite context. It is impossible to include thousands of knowledge items in a single prompt. |
Performance Degradation | Processing enormous volumes of context for every query would be prohibitively slow, defeating the purpose of a real-time investigation tool. |
The "Needle in a Haystack" Problem | A large, unfiltered set of knowledge items introduces noise. The agent can become confused by conflicting or irrelevant information. |
Our Approach: Multi-Stage Retrieval
To address these challenges, we built a multi-stage retrieval architecture that draws on our co-founders’ work in causal matrix completion [4], synthetic interventions [1], and kernel thinning [7].
Rapid Candidate Selection. We use semantic search to identify a smaller subset of potentially relevant knowledge items, reducing the search space from thousands to a manageable set. This approach draws on Raaz’s work on kernel thinning [7], which studies how large distributions can be efficiently represented by smaller, highly informative subsets. We apply these principles when distilling large Knowledge Banks down to the handful of knowledge items most relevant to a given investigation.
Contextual Ranking. The candidate set passes through a ranking model that performs deeper analysis, considering the nuances of the current investigation to select the most relevant knowledge items.
Offline Optimization. Much of the computational work—embedding generation, similarity indexing—happens offline, ensuring real-time retrieval remains fast.
This architecture allows Traversal to deliver both high-fidelity context and real-time performance at enterprise scale.
Network-Aware Learning
Production systems are interconnected, and actions on one component often affect others. This reflects the network interference problem studied by Anish and Abhineet Agarwal [2].
When you guide Traversal about one service, that knowledge informs how it reasons about connected services. Our network-aware approach ensures that knowledge items propagate appropriately through your service topology.
When a knowledge item is used during an investigation, it is explicitly surfaced in the UI—transparency that is crucial for building trust and allowing users to understand exactly what influenced the AI's conclusions.
What's Next
We are already working on the next generation of capabilities for Knowledge Bank:
Bulk upload — Upload hundreds of documents at once and automatically generate knowledge items, making it easy to onboard your entire runbook library
Enterprise governance — Advanced permissioning, approval workflows, and review cycles so organizations can control who can create, edit, and publish knowledge items
Pre-publish feedback — Before a knowledge item goes live, get AI-powered feedback on how it will likely affect accuracy and whether it conflicts with existing knowledge items
Knowledge effectiveness reporting — See how the knowledge items you've created have impacted investigation accuracy over time, with clear metrics on which ones are being used and how they're helping
Getting Started
Your infrastructure is unique. Your team's expertise is irreplaceable.
Access Knowledge Bank in the Settings section of your Traversal workspace and get started by uploading a knowledge item today.
References
Note: These citations feature work by Traversal's co-founders: Anish Agarwal (CEO) [1-4, 8], Raj Agrawal (Chief Scientist) [5-6], and Raaz Dwivedi (CTO) [7-8], as well as AI Researcher Abhineet Agarwal [2].*
[1] Agarwal, A., Shah, D., & Shen, D. (2020). Synthetic Interventions. Operations Research. https://arxiv.org/abs/2006.07691
[2] Agarwal, A., Agarwal, A., Masoero, L., & Whitehouse, J. (2024). Multi-Armed Bandits with Network Interference. Advances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/2405.18621
[3] Agarwal, A., Agarwal, A., & Vijaykumar, S. (2023). Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions. Advances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/2303.14226
[4] Agarwal, A., Dahleh, M., Shah, D., & Shen, D. (2023). Causal Matrix Completion. Conference on Learning Theory (COLT). https://arxiv.org/abs/2109.15154
[5] Agrawal, R., Squires, C., Yang, K., Shanmugam, K., & Uhler, C. (2019). ABCD-Strategy: Budgeted Experimental Design for Targeted Causal Structure Discovery. International Conference on Artificial Intelligence and Statistics (AISTATS). https://arxiv.org/abs/1902.10347
[6] Agrawal, R., Uhler, C., & Broderick, T. (2018). Minimal I-MAP MCMC for Scalable Structure Discovery in Causal DAG Models. International Conference on Machine Learning (ICML). https://arxiv.org/abs/1803.05554
[7] Dwivedi, R. & Mackey, L. (2021). Kernel Thinning. Conference on Learning Theory (COLT). https://arxiv.org/abs/2105.05842
[8] Choi, K., Feitelberg, J., Chin, C., Agarwal, A., & Dwivedi, R. (2024). Learning Counterfactual Distributions via Kernel Nearest Neighbors. arXiv preprint.https://arxiv.org/abs/2410.13381






