Who Governs the Agents? Why AI Analytics Needs a Context Layer Outside Your Warehouse
At a glance
- The governance gaps you already manage (duplicate definitions, thin documentation, lineage that stops at the warehouse boundary) do not go away with AI agents. They get a new consumer that has no judgment about which context to trust.
- A semantic layer governs the metrics you have modeled, which is a narrow slice. Agents also rely on business knowledge, conflicting definitions, and access policy the semantic layer rarely holds in full, the same breadth your data catalog already documents for people.
- Metric definitions often pass technical review without business sign-off. A definition can clear pull-request review and CI and still have no metric owner who agreed it is how the company counts the number, which is fine for people who apply judgment and risky for agents that do not.
- The practical decision is a choice between extending your core platform's tooling and building a governance capability alongside it that reaches more sources and stays portable.
- Ownership of this work overlaps data governance, analytics engineering, and data engineering, which is why it tends to fall between teams.
- A governed context layer (reviewable files, human approval, role-based access, evaluation) is how agents end up inside your governance model instead of beside it.
Dashboards came with guardrails built in. Agents don't. Here is why the layer that governs them works best sitting alongside your whole stack, not just inside the warehouse.
A lot of data teams are in the same spot right now. The business has realized it can point Claude, ChatGPT, Gemini, or a BI copilot at the warehouse and ask questions in plain language, and the early results are good enough that people want to roll it out widely. Self-serve analytics without the usual BI backlog is a genuinely appealing idea. Somewhere in that conversation, a governance lead usually raises a quieter question that is easy to skip past: who decides what these agents are allowed to know, and whether the answers they give are the approved ones?
For a lot of teams this is not urgent yet. It is the kind of thing you can see coming. At some point a non-technical user asks an agent for "active customers this quarter" or "revenue from at-risk accounts" and gets a confident number that does not match what the analytics team would have produced. The agent did not malfunction. It answered with whatever definition it could find, because nobody had decided which definition it was supposed to use.
This post is about why that problem does not live where most teams first look for it, and why the layer that fixes it works best when it sits outside the warehouse rather than inside it.
The problems are not new. The consumer is.
Think about the issues you already manage. The same metric defined 3 ways across 3 tools. Documentation written by dozens of people over many years, some of it thorough and some of it a single line. Lineage that goes dark as soon as you step upstream of the warehouse. Sensitive fields classified carefully in one place and exposed casually in another. None of this is new, and you have been managing it for years, partly because the people consuming that context apply judgment. An experienced analyst can tell when a definition looks out of date and quietly works around it.
An agent does not do that. It has no instinct for which number is the trustworthy one, no colleague to check with mid-question, and no sense that a definition looks suspiciously old. It uses the context it can find and produces an answer. And when the context is missing, an agent often does not stop to ask. It invents a plausible definition and answers anyway, which is the more dangerous case, because a made-up definition that looks reasonable is harder to catch than a blank. An out-of-date definition is just as bad: the agent has no reason to doubt it, so it runs with it.
So the gaps do not disappear when you add agents. They become more visible, because a person was absorbing them before and now a literal-minded system is surfacing them. We wrote more on that side in how to govern AI agent access to business metrics and why semantic layers alone won't stop AI hallucinations.
Your semantic layer is a slice, not the whole picture
A semantic layer does its job well. When an agent or BI tool maps a question like "monthly revenue" to a modeled metric, Cube, dbt MetricFlow, or LookML generates the governed query path for it. Point an agent at a well-modeled metric and the number comes back right. That part works, and it is worth having.
The catch is that modeled metrics are a narrow slice of what an agent actually uses to answer a question. Most of what decides whether a number is right is not a formula. It is knowing which of 2 conflicting "revenue" definitions finance treats as canonical, that a metric excludes a region for a policy reason, that a table was deprecated last quarter, what a field in the CRM actually means, and who is allowed to see account-level detail. A semantic layer can carry some of this as descriptions, dimensions, or ownership metadata, but it rarely holds all of it. The rest lives in docs, Confluence, Notion, BI tools, tickets, and people's heads.
There is also the upkeep problem. A modeled metric only stays correct while someone maintains it, and the modeling layer is time-consuming to keep current. Definitions drift out of date quietly, and nobody notices until a number looks wrong, which an agent then treats as authoritative because it has no way to tell a fresh definition from a stale one.
This is exactly why you run a data catalog. A catalog sits horizontally across the stack, including the source systems upstream of the warehouse, and a mature one does real governance work: classification, lineage, stewardship, sometimes masking and access workflows. The semantic layer never describes that much surface. But even an active catalog is not, by itself, an executable context runtime for an agent. It documents and governs metadata; it does not hand the agent a runnable, approved definition or apply a masking rule to the SQL the agent just wrote. An agent needs the catalog's breadth made executable: an approved definition it can run, and policy enforced on its query, not just described.
So "outside the warehouse" is not a claim about where data is computed. It is about scope and ownership. The layer that governs agents needs the breadth your catalog already has, made executable, and it should not be confined to one vendor's warehouse-and-metrics tooling. To be clear, sitting outside does not mean bypassing your warehouse: the context layer is a control and context plane, and it still enforces policy through the warehouse, BI layer, identity provider, or query gateway you already trust. We compared those shapes in data catalog vs. context layer.
Definitions rarely have a clear owner
There is a second issue, and it is the one governance teams tend to react to most. Plenty of teams have real engineering rigor around the modeling layer: pull-request review, dbt CI, LookML validation, ownership metadata, controlled deploys. That rigor is worth having. But it checks whether the code is sound, not whether the business agreed with the number. A definition can pass every technical gate and still have no business owner who signed off that this is how the company counts revenue. So "who approved this metric?" usually has a clean engineering answer and a blank business one, and the trust signal an agent would need, that finance or the metric owner actually stands behind it, is the one that is missing.
That gap was survivable when the consumers were people who apply judgment. It is harder to justify when the consumer is an agent that treats whatever definition it finds as correct and uses it. The problem gets sharper when definitions conflict: the same "weekly revenue" defined one way in a modeling file and another way in a BI tool, both plausible, both discoverable. A person picks the one they trust. An agent may use whichever one retrieval or prompting happens to surface first.
Governing this means treating definitions more like code: proposed, reviewed, approved, versioned, with conflicts surfaced and settled by someone before anything reaches an agent. We go deeper on that workflow in reviewing metrics like code and why revenue metrics break in AI self-serve analytics.
The real choice: extend your core platform, or build alongside it
Put those 2 issues together, the narrow coverage of a modeled-metrics layer and definitions nobody formally owns, and you arrive at the decision most data leaders are actually weighing. It comes down to 2 paths:
- Extend your core platform. Invest more deeply in your warehouse and modeling vendor's tooling, govern agents through it, and accept that you are governing the modeled-metrics slice and tying that governance to one vendor.
- Build the capability alongside your stack. Stand up a layer that covers more ground (the business knowledge and conflicting definitions, not just modeled metrics), stays vendor-neutral, and remains portable if you change a part of the stack later.
Both are reasonable, and the right call depends on your situation. For governance specifically, though, building alongside has a structural advantage. Governance tied to one platform can only ever govern that platform's part of the picture. The reason many teams bought a data catalog that sits horizontally across the stack was to compare definitions, check quality, and trace lineage across every system, including the source systems the warehouse's models never describe. The layer that governs agents wants to live in the same place, for the same reasons. There is a practical upside too. If your platform contract changes in a couple of years, reviewed context that lives in portable files comes with you instead of staying behind in a tool you are leaving.
This is also the part that tends to land with a VP or CDO. Most organizations have far more data sources than they have warehouses and modeling tools. Governing agents inside one vendor's layer governs one slice and ties you to that vendor. Governing them in a portable layer alongside the stack covers more of the picture and keeps your options open if the stack changes later.
Who actually owns this?
That leads back to the title, and to the part with no settled answer yet. When a definition needs an owner, a stale doc needs retiring, or a conflict needs resolving, whose job is that? Data governance? Analytics engineering? Data engineering? It touches all 3, and that overlap is why the work keeps slipping between teams.
The cleanest approach we have seen is to stop treating "context" as a feature buried inside someone's existing tool and start treating it as a shared responsibility with its own home:
- Data engineering keeps the sources connected and the pipelines running, including the upstream systems.
- Analytics engineering writes and refines the definitions and the joins.
- Business data owners and metric owners approve the definitions in their domain and settle what a metric actually means. Governance runs the process, but the owners make the call.
- Data governance owns the operating model: the approval workflow, classification and access policy, conflict resolution, and the signal that an answer is sanctioned.
None of that holds together if each function edits a different tool. It works when all 3 collaborate on the same reviewable files, the way engineering teams collaborate on code. That is the organizational point, and it is why the form of the context layer matters as much as its contents. For where this sits relative to the tools you already own, see context layer vs. semantic layer.
What a governed context layer looks like
So what is the thing that sits alongside the stack and gives all 3 teams something to maintain together? A governed context layer takes your schema, joins, approved definitions, BI logic, query history, and the business knowledge sitting in tools like Confluence and Notion, and turns it into reviewable files that agents search and run instead of guessing. We pulled apart the mechanics in building a context layer for the agentic era. For a governance lead, 4 properties are what make it governed rather than just convenient:
| Your semantic layer (Cube, dbt MetricFlow, LookML) | Governed context layer | |
|---|---|---|
| What it covers | The metrics you have modeled | Metrics, plus business knowledge and conflicting definitions across systems |
| Approval and trust signal | Technical merge review, but no business-owner sign-off | Reviewed and signed off by the metric owner, versioned, conflicts resolved |
| Access and PII for agents | Rarely enforced on the agent's query | Classification, masking, and role policy applied to the generated SQL |
| Trust over time | Manual spot-checks | Tested against known-correct answers |
| Survives a tooling change | Tied to the vendor | Portable, vendor-neutral files |
2 of those rows are worth a moment.
Access rules need to be enforced, not requested. Telling an agent "do not show PII" inside a prompt or a skill is a request to a system that does not guarantee compliance. Real enforcement applies your data classification, masking, row-level, and role-based policies to the SQL the agent generates, before it runs, with nothing left to the model's discretion. The cleanest version of this maps your existing access model onto agents rather than inventing a separate one, which is the pattern in how to govern AI agent access to business metrics.
Trust needs to be measured. Writing down known-correct answers for your most important questions ("average order value last quarter should be roughly X") and re-checking them after each update is how you catch the moment a new definition quietly changes an answer you had already approved. Done properly this looks less like a checklist and more like a test suite: golden queries over fixed time windows, snapshot fixtures so the data underneath does not move, tolerances for acceptable drift, expected outputs per permission level, and freshness checks. Pasting example SQL into a skill gets you started; a versioned suite like that is what scales it and puts it on governance footing. More on that in how to prove your AI analytics answers are trustworthy.
This is the gap ktx is built to close. ktx is an open-source context layer for data agents. It reads from your stack (warehouses, BI tools, modeling code, query history, Notion, Confluence), reconciles what it finds, flags contradictions for a person to resolve, and exposes the approved context to any agent over MCP. Everything is stored as git files you can review, diff, and merge. There is also a hosted version for teams that adds managed sync, the review and conflict-resolution workflow, role-based access, per-agent access tokens, activity monitoring, and the evaluation checks, which is the multi-user setup a governance function tends to need.
How to start without replacing your stack
You do not replace your warehouse or your modeling tool, and you do not try to clean up years of scattered documentation in week one. A reasonable first quarter looks like this:
- Pick one high-stakes, contested metric, the one 2 teams define differently, and govern its definition end to end: gather the candidate definitions, surface the conflict, approve one, version it.
- Pull in the knowledge around that metric that is not a formula: the caveats, the canonical-source decision, the deprecated tables to avoid, from your docs and Confluence, so the agent has the why and not just the how.
- Map your existing access model onto the agents querying it, enforced rather than prompted, so masking, row-level, and role boundaries actually hold.
- Stand up a small set of golden queries with fixed time windows and expected answers (per permission level where it matters), and re-run them on every update.
- Show the portable result to your VP: reviewed files that cover more than modeled metrics and do not belong to any single vendor.
That is a contained pilot a governance team can run alongside the stack, the same way you already run a horizontal catalog. If a fuller migration is ahead of you, moving from a semantic layer to a governed context layer walks through the path.
The agents are already in the building, answering questions in plain language, whether or not the governance is ready for them. The reassuring part is that this is not a brand-new discipline you have to invent. It is the governance you already practice, applied to a new and very literal consumer, and moved to a layer that can reach everything that consumer will touch. Govern the context once, outside the warehouse, and every agent works from the same approved version. If you want to see what that looks like on your data, book a walkthrough.
FAQ
What does it mean to govern AI agents that query company data?
Governing AI agents means deciding which metric definitions, source systems, business rules, and data an agent can use to answer a question, and being able to show the answer matches the approved definition. A dashboard is bounded, but an agent is open-ended: it can pick any table, any join, and any definition. So the governance has to follow the question into the agent runtime, not stop at the report or the warehouse permissions and policies.
Why isn't a semantic layer enough to govern agent analytics on its own?
A semantic layer (Cube, dbt MetricFlow, LookML, or a warehouse-native one) governs the metrics you have modeled, and an agent on top of it will compute those correctly. But that is a narrow slice of what an agent uses to answer a question. The knowledge that decides whether a number is right (which of 2 conflicting definitions is canonical, what a metric excludes, which tables are deprecated, what a field means across systems) is rarely captured there in full; most of it lives in docs, catalogs, and BI tools. Modeled metrics are also time-consuming to maintain and drift stale without anyone noticing, and definitions often have implicit or fragmented ownership, which is risky when an agent will use whatever it finds, or invent one.
What is a governed context layer, and how is it different from a data catalog?
A governed context layer turns your schema, joins, approved definitions, BI logic, and business knowledge into reviewable files that agents can search and run. A data catalog is your metadata and stewardship system of record: classification, lineage, glossary ownership, and policy, mostly serving people and governance workflows. A context layer is narrower and operational. It takes selected, approved context and makes it executable in the agent runtime, with conflict resolution, enforced access, and evaluation. Catalog and context layer are complementary: the catalog governs the metadata, the context layer governs what the agent actually runs.
Who should own the context layer: data governance, analytics engineering, or data engineering?
It touches all 3, which is why the work tends to fall between teams. In practice it works as a shared responsibility: data engineering keeps the sources and pipelines running, analytics engineering writes the definitions, and data governance owns approval, access policy, and trust. A reviewable, version-controlled context layer lets all 3 collaborate on the same files instead of arguing over which tool owns the truth.
How do we start without replacing our current stack?
You do not replace anything. A context layer reads from your warehouse, modeling code, BI tools, query history, and docs, reconciles them, and flags contradictions, then serves the approved result to agents. It sits alongside your stack the way a horizontal data catalog does, so it reaches upstream sources too and stays portable if your core platform changes.
Sources
- ktx, the open-source context layer for data agents (GitHub)
- The context layer (ktx docs)
- Data Modeling for the Agentic Era: Semantics, Speed, and Stewardship (Rill)
- How Looker's semantic layer enhances gen AI trustworthiness (Google Cloud)
- NIST AI 600-1: Generative AI risk management
- Model Context Protocol