CEO at Kaelio
Why Your Semantic Layer Alone Won't Stop AI Agent Hallucinations
By Luca Martial, CEO & Co-founder at Kaelio | Ex-Data Scientist | 2x founder in AI + Data | ex-CERN, ex-Dataiku ·
Semantic layers have become the standard answer to metric inconsistency. Tools like dbt, LookML, Cube, and AtScale promise that if you define your metrics in one place, every downstream consumer will get the same answer. That promise holds when the consumer is a human analyst writing SQL or building a dashboard. It breaks down when the consumer is an AI agent.
AI data agents are now embedded across the modern data stack, from conversational analytics interfaces to autonomous reporting pipelines. These agents need more than metric definitions to produce trustworthy answers. They need to know when a definition applies, who can see the result, where the data came from, how it was transformed, and what business context surrounds it. A semantic layer provides a fraction of that picture. The rest is what we call the context layer, and without it, your agents will hallucinate. Research on large language models confirms this pattern: a 2023 survey on LLM hallucination found that insufficient grounding context is a primary driver of factual errors in model outputs.
This post is a technical deep-dive into the six specific failure modes that semantic layers leave open, and how a context layer addresses each one.
At a Glance
- Semantic layers are necessary but insufficient. They define metric logic but omit temporal validity, sensitivity rules, dashboard-level calculations, domain knowledge, and lineage metadata.
- AI agents fill context gaps with assumptions. When an agent lacks the information it needs to answer a question, it guesses. In data analytics, guessing means hallucination. Understanding how accurate AI data analyst tools really are requires examining these context gaps.
- The same question yields different answers depending on which agent, which prompt, or which tool surfaces the response, because there is no unified context governing consumption.
- Governance ends at the warehouse. dbt models are governed, but BI tools, spreadsheets, and ad hoc SQL are not.
- A context layer captures four pillars beyond semantic models: schema and lineage, dashboard logic, domain knowledge, and consumption rules. You can build a context layer in minutes, not months.
- Kaelio auto-builds this context layer from your existing stack, connecting to 900+ tools so that every AI agent works from the same governed truth.
What Semantic Layers Actually Do (And Where They Stop)
Before examining failure modes, it helps to be precise about what a semantic layer provides. A well-implemented semantic layer, whether in dbt's MetricFlow, LookML, or a headless BI platform like Cube, gives you three things:
- Metric definitions. A canonical formula for how a metric is calculated. For example,
MRR = SUM(subscriptions.amount) WHERE subscriptions.status = 'active'. - Dimension mappings. A translation layer between raw column names and business-friendly terms.
cust_idbecomes "Customer,"txn_amtbecomes "Transaction Amount." - Join logic. Rules for how tables relate to each other, so that queries across multiple entities produce correct results.
These three capabilities solve a real and important problem. As we covered in Why Every Growing Company Needs a Semantic Layer, metric inconsistency is one of the most expensive problems in modern analytics. A semantic layer eliminates an entire class of "two dashboards, two numbers" issues.
But here is the critical gap. A semantic layer tells an AI agent how to calculate a metric. It does not tell the agent when that calculation is valid, who is allowed to see it, which team's version is canonical, what business logic exists outside the warehouse, or where the underlying data came from. Each of these missing dimensions creates a specific, predictable failure mode. The NIST AI Risk Management Framework identifies data provenance and context as core requirements for trustworthy AI systems, and this is precisely where semantic layers fall short.
Failure Mode 1: Temporal Context Gaps
The Problem
Semantic layers define the current state of a metric. They do not carry temporal validity metadata. When a metric definition changes, the old definition is overwritten. There is no record of what the metric meant last quarter, last year, or at any prior point in time.
This creates a serious problem for AI agents that answer historical questions. Consider a concrete example. Your company defined "active user" as "any user who logged in within the past 30 days" for the first three years of its existence. Last quarter, the product team narrowed the definition to "any user who logged in within the past 7 days" to better reflect actual engagement. The semantic layer was updated to reflect the new definition.
Now an executive asks an AI agent: "How has our active user count trended over the past 12 months?"
The agent retrieves the current metric definition and applies it uniformly across all 12 months. The result shows a dramatic decline in active users starting at the exact month the definition changed. The executive sees a product crisis. In reality, there was a measurement methodology change. The trend line is an artifact, not a signal.
Why Semantic Layers Cannot Fix This
Semantic layers are designed to be current-state systems. dbt's MetricFlow defines metrics in YAML files that are version-controlled in Git, so you could theoretically trace the history. But that Git history is not exposed as queryable metadata. An AI agent hitting the semantic layer API gets the current definition, period. It has no mechanism to detect that the definition changed on a specific date, or to apply the prior definition to the appropriate time window. Research on text-to-SQL systems confirms this limitation: a benchmark study on text-to-SQL accuracy showed that temporal reasoning is one of the hardest challenges for LLM-based query generation.
How a Context Layer Addresses This
A context layer maintains temporal validity windows for every metric definition. When the "active user" definition changes, the context layer records that Definition A was valid from January 2023 through December 2025, and Definition B became effective in January 2026. An AI agent querying this context layer receives instructions to apply the correct definition to each time period, or at minimum, to flag the definition change in its response so the consumer understands the discontinuity.
Failure Mode 2: Missing Sensitivity Classification
The Problem
Semantic layers define what metrics mean. They do not define who can see them or under what conditions they can be surfaced. This omission becomes dangerous when AI agents operate in multi-tenant or role-sensitive environments.
Consider a scenario where an AI agent is embedded in a company's internal Slack workspace. A manager in the marketing department asks: "What is the average compensation by department?" The semantic layer contains a clean definition of "average compensation" with the correct formula. The AI agent generates accurate SQL, runs the query, and returns a table with average compensation broken down by department, including individual-level data for departments with only one or two employees.
The result is technically correct. It is also a policy violation. Compensation data at that granularity is restricted to HR and Finance. The semantic layer did not encode this access rule because access control is not part of its design scope.
Why Semantic Layers Cannot Fix This
Warehouse-level row-level security and role-based access control (RBAC) handle some of this at the query execution layer. Platforms like Snowflake, BigQuery, and Power BI each offer their own access control mechanisms, but these operate at the storage level. AI agents often operate at a higher abstraction level. They may aggregate, summarize, or rephrase data in ways that bypass row-level controls. An agent might not return the raw rows but still include derived insights ("Department X has significantly lower compensation than the company average") that reveal restricted information. Semantic layers have no concept of consumption sensitivity, the idea that certain data can exist in the warehouse but should never appear in an AI-generated response without specific authorization.
How a Context Layer Addresses This
A context layer attaches sensitivity classifications to metrics and dimensions. These classifications go beyond warehouse RBAC. They define rules like "compensation data may only be surfaced to users with HR or Finance roles," "any aggregation with fewer than 5 individuals must be suppressed," and "PII fields must be masked in conversational responses." When an AI agent queries the context layer, it receives these rules alongside the metric definition, allowing it to self-govern its outputs. This aligns with what Anthropic describes as grounding, providing models with sufficient context to constrain their outputs. Kaelio enforces these policies across all 900+ connected tools, ensuring that governed analytics extend to every AI interaction. For organizations in regulated industries, this is especially critical. See our guides on SOC 2-compliant AI analytics and HIPAA-compliant AI data analysis for more on compliance requirements.
Failure Mode 3: Ambiguous Cross-Team Definitions
The Problem
In any company above a few dozen employees, different departments define the same metric differently. This is not a bug. It reflects genuine differences in how teams think about the business. The problem is that these definitions coexist across tools without a unifying layer to disambiguate them.
Take MRR (Monthly Recurring Revenue) as a common example. Finance calculates MRR from the billing system, including only finalized invoices and excluding trials. Sales calculates MRR from the CRM, including expected revenue from deals marked as "Closed Won" even before the first invoice is generated. Product calculates MRR from usage-based pricing tiers in the product database, which may not match either of the other two.
Each team has a legitimate reason for its calculation. Each team's semantic layer (if they have one) encodes its own version. When an AI agent receives the question "What is our current MRR?", which definition does it use?
Why Semantic Layers Cannot Fix This
If your company uses a single, centralized semantic layer (like a company-wide dbt project), you might have one canonical definition. But in practice, most companies have fragmented semantic coverage. The dbt project governs warehouse metrics. Looker has its own LookML models. Tableau has its own calculated fields. Salesforce has its own formula fields and report types. Each of these tools functions as a local semantic layer for the team that uses it. There is no single semantic layer that spans them all.
An AI agent that connects to multiple data sources encounters multiple definitions. Without guidance on which definition is canonical for a given context (board reporting vs. sales forecasting vs. product planning), the agent picks whichever definition it encounters first, or worse, blends them inconsistently. This is a well-documented challenge: Gartner estimates that poor data quality costs organizations an average of $12.9 million per year, and conflicting metric definitions are a leading contributor.
How a Context Layer Addresses This
A context layer maps every metric definition to a specific domain, audience, and use case. Instead of one definition of MRR, the context layer maintains all three and labels them: "MRR (Finance, Board Reporting)," "MRR (Sales, Pipeline Forecasting)," "MRR (Product, Usage Tiers)." When an AI agent receives a question, it uses the context layer to determine which definition applies based on who is asking and in what context. If the question is ambiguous, the agent can surface the disambiguation to the user rather than silently choosing a definition. This approach aligns with best practices for governed metrics, where transparency about metric provenance is as important as the metric itself. It is also central to the question of whether AI analytics tools can be trusted with business metrics.
Failure Mode 4: Dashboard Logic Is Invisible
The Problem
A significant amount of business logic lives inside BI tools and never makes it into any semantic layer. Calculated fields in Tableau, table calculations in Looker, DAX measures in Power BI, and custom SQL in embedded analytics all encode business rules that exist nowhere else.
Consider a Tableau dashboard that shows "Net Revenue." The underlying data source contains gross revenue. A calculated field in Tableau subtracts refunds, chargebacks, and a custom adjustment for partner revenue-share agreements. This calculation is business-critical. It is what the CFO reviews every Monday. But it exists only inside the Tableau workbook. It is not in dbt. It is not in LookML. It is not in any YAML file.
When an AI agent answers "What was our net revenue last quarter?", it queries the warehouse or the semantic layer. It gets gross revenue. It has no knowledge of the Tableau-specific adjustments. The answer is wrong, and there is no signal to indicate that it is wrong.
Why Semantic Layers Cannot Fix This
Semantic layers, by design, sit at the warehouse or transformation layer. They do not introspect BI tools. dbt does not parse Tableau workbooks. LookML does not scan Power BI files. Even Looker's own access grants and model permissions do not extend to logic living in other BI platforms. The logic embedded in these tools is a blind spot for any warehouse-centric governance approach.
This is not a theoretical edge case. A 2024 analysis by Atlan found that organizations with mature data governance practices still report that a significant share of business-critical logic lives outside their governed transformation layers, primarily in BI calculated fields, spreadsheets, and ad hoc scripts.
How a Context Layer Addresses This
A context layer connects to BI tools directly and extracts the business logic they contain. Kaelio connects to Tableau, Looker, Power BI, Metabase, and other visualization platforms through its 900+ connectors. It parses calculated fields, custom SQL, filters, and visualization-level transformations, then incorporates them into the context layer. When an AI agent asks about net revenue, it sees the full picture: the warehouse-level definition plus the BI-level adjustments. This is what distinguishes a best-in-class analytics solution from a warehouse-only semantic layer. For teams already using Looker or Snowflake, Kaelio layers on top of these tools without replacing them.
Failure Mode 5: Domain Knowledge Is Unstructured
The Problem
Every company has institutional knowledge that shapes how data should be interpreted but that has never been codified in any technical system. This knowledge lives in Confluence pages, Slack threads, email chains, onboarding documents, and most commonly, in people's heads.
Examples of this unstructured domain knowledge include:
- "We exclude the APAC region from global churn calculations because we only launched there six months ago and the cohort is too small to be meaningful."
- "Q4 numbers always look inflated because enterprise customers prepay annual contracts in December."
- "The 'Referral' source in HubSpot actually includes both organic referrals and our partner program. We split them manually in the quarterly report."
- "When we say 'customer' in board reporting, we mean accounts with ARR above $10K. In product analytics, 'customer' means any account with at least one active user."
An AI agent that lacks this knowledge will produce answers that are technically correct but contextually wrong. It will include APAC in global churn. It will flag Q4 revenue as an anomaly. It will treat all referrals as organic. It will conflate $500/year accounts with enterprise customers.
Why Semantic Layers Cannot Fix This
Semantic layers are structured systems. They define metrics using code: SQL, YAML, LookML, or similar formalisms. They are excellent at encoding formulas. They are not designed to capture the narrative context that surrounds those formulas. You cannot write a dbt YAML comment that says "exclude APAC because the cohort is too small" and have that comment influence how an AI agent answers questions. The comment might exist in a code repository, but it is not part of the semantic layer's queryable metadata. OpenAI's documentation on structured outputs and function calling demonstrates how AI models can consume structured context at inference time, but only if that context is available in the first place.
How a Context Layer Addresses This
A context layer provides a structured home for unstructured domain knowledge. Kaelio captures domain knowledge as governed annotations attached to specific metrics, dimensions, and data sources. These annotations are queryable. When an AI agent encounters a question about global churn, the context layer supplies the annotation: "APAC is excluded from global churn calculations per policy established in Q2 2025." The agent incorporates this into its response, either by excluding APAC automatically or by disclosing the convention to the user.
This is one of the four pillars of Kaelio's context layer: Domain Knowledge. Alongside Schema and Lineage, Semantic Models and Metrics, and Dashboard Logic, domain knowledge turns isolated data definitions into fully contextualized intelligence that AI agents can rely on. For a detailed breakdown of these pillars, see What Is a Context Layer? The Foundation AI Data Agents Need.
Failure Mode 6: Lineage Gaps
The Problem
Knowing that a metric exists and how it is calculated is not sufficient for trustworthy AI output. Agents also need to understand where data comes from, how it was transformed, what dependencies exist, and what the data quality characteristics are at each stage.
Consider an AI agent that is asked: "What is our customer acquisition cost (CAC) for Q1?" The semantic layer defines CAC as total sales and marketing spend divided by new customers acquired. The agent runs the calculation and returns $1,200. But the agent does not know that the marketing spend data is sourced from a Google Ads integration that failed for the last two weeks of March. It does not know that the "new customers" count is derived from a Salesforce sync that has a 48-hour lag. It does not know that a recent migration from one billing system to another created duplicate records that inflate the denominator.
Without lineage awareness, the agent has no way to assess the reliability of its own answer. It presents $1,200 with the same confidence it would present a number backed by complete, high-quality data.
Why Semantic Layers Cannot Fix This
Semantic layers operate at the logical layer. They define what a metric means. They do not track the physical lineage of data: which source systems feed which tables, what ETL pipelines transform the data, when those pipelines last ran successfully, and what quality checks exist at each stage. Some data catalog tools like Atlan, Alation, and DataHub provide lineage capabilities, and data observability platforms like Monte Carlo add quality monitoring. But these systems are not integrated with the semantic layer in a way that AI agents can query at inference time.
The result is that AI agents operate in a lineage vacuum. They know how to calculate the number but not whether the calculation can be trusted right now, given the current state of the underlying pipelines and sources.
How a Context Layer Addresses This
A context layer integrates lineage metadata as a first-class concern. Kaelio's Schema and Lineage pillar tracks data provenance across 900+ connected tools, from source systems through transformation layers to consumption endpoints. When an AI agent queries the context layer, it receives not just the metric definition but also freshness metadata ("marketing spend last updated 14 days ago"), quality signals ("Salesforce sync has a 48-hour lag"), and dependency information ("this metric depends on the billing migration being complete"). Armed with this context, the agent can qualify its answer, flag data quality issues, or decline to answer until the underlying data is reliable. This capability is essential for conversational analytics over modern data warehouses, where data freshness varies across sources.
The Context Layer: Putting All Four Pillars Together
The six failure modes described above map to four pillars that a context layer must provide beyond what a semantic layer offers:
| Pillar | What It Captures | Failure Modes Addressed |
|---|---|---|
| Schema and Lineage | Data provenance, transformation history, dependencies, freshness, quality signals | Lineage gaps, temporal context |
| Semantic Models and Metrics | Metric definitions with temporal validity, cross-team disambiguation, and canonical labels | Temporal context, cross-team ambiguity |
| Dashboard Logic | Calculated fields, filters, custom SQL, and visualization-level transformations from BI tools | Invisible dashboard logic |
| Domain Knowledge | Business rules, conventions, exceptions, and institutional context | Unstructured domain knowledge, sensitivity classification |
Kaelio is built around these four pillars. It connects to your existing stack (over 900 tools spanning data warehouses, BI platforms, CRMs, billing systems, support tools, and more) and auto-builds the context layer by learning from what already exists. There is no six-month modeling project. There is no YAML to write. The platform reads your dbt models, your Looker explores, your Tableau workbooks, your Salesforce reports, and your team's domain knowledge, then unifies them into a single governed layer that any AI agent can consume.
This is what it means for Kaelio to be "the context layer your data agents need." The emerging Model Context Protocol (MCP) standard is making this kind of governed context delivery even more powerful, enabling AI agents to request exactly the context they need through a standardized interface. The semantic layer tells agents how to calculate. The context layer tells them everything else they need to get the answer right.
Why This Matters Now
The urgency of this problem is increasing. McKinsey estimates that generative AI could add up to $4.4 trillion in annual value across industries, but only if organizations can trust the outputs. The EU AI Act is codifying data governance requirements for high-risk AI systems, making context-aware data access a compliance imperative, not just a best practice. As organizations deploy more AI agents across their analytics workflows, the surface area for hallucination grows. Every new agent that connects directly to a warehouse or a semantic layer without full context is a new vector for inconsistent, incorrect, or policy-violating outputs. This is why executives are increasingly asking for analytics copilots that they can actually trust.
The pattern is predictable. A company invests in a semantic layer. It deploys an AI agent. The agent produces mostly-right answers. Users begin to trust it. Then an edge case appears: a historical comparison that ignores a definition change, a response that surfaces sensitive data, a metric that uses the wrong team's definition. Trust erodes. Adoption stalls. The company concludes that "AI is not ready" when the real problem is that the context infrastructure was incomplete.
The companies that will succeed with AI-powered analytics are the ones that recognize the semantic layer as a foundation, not a ceiling. They are building context layers that provide AI agents with the full picture: not just what metrics mean, but when they apply, who can see them, where the data comes from, what business rules govern interpretation, and how the numbers in BI tools relate to the numbers in the warehouse. For teams that work with dbt and LookML, a context layer extends rather than replaces existing governance investments. And for organizations looking to clear their BI backlog, a context layer is what makes self-serve AI analytics safe to deploy at scale.
That is what Kaelio delivers. If you are evaluating how to make your AI agents trustworthy, the question is not "do we have a semantic layer?" It is "do we have a context layer?"
Frequently Asked Questions
Why do AI data agents hallucinate even with a semantic layer?
Semantic layers define what metrics mean but omit critical context such as temporal validity, sensitivity classifications, dashboard-level business logic, cross-team definition conflicts, and lineage metadata. Without this additional context, AI agents fill in gaps with assumptions, which produces hallucinated or inconsistent answers. For a deeper look at how governed metrics reduce this risk, see our guide on the best analytics copilot for governed metrics.
What is a context layer and how is it different from a semantic layer?
A context layer extends a semantic layer by capturing four additional pillars: schema and lineage metadata, semantic models and metrics (with temporal validity and cross-team disambiguation), dashboard logic from BI tools, and unstructured domain knowledge. While a semantic layer tells an AI agent how a metric is calculated, a context layer tells it when the definition applies, who can see it, where the data came from, and what business rules govern its interpretation. Learn more about how this fits into the modern data stack in our post on the best semantic layer solutions for data teams.
Can dbt or LookML solve the AI hallucination problem on their own?
No. dbt and LookML govern metric definitions at the warehouse level, but they do not capture business logic embedded in BI calculated fields, spreadsheet formulas, ad hoc SQL, or tribal knowledge. Governance ends at the warehouse. A context layer extends governance to the full consumption stack so AI agents have a complete picture. For more on how dbt integrates with modern AI analytics, see our post on whether AI analytics tools work with dbt models.
How does Kaelio prevent AI agent hallucinations?
Kaelio auto-builds a context layer from your existing stack by connecting to 900+ tools. It captures schema and lineage, semantic models, dashboard logic, and domain knowledge in a single governed layer. AI agents that query Kaelio receive the full context they need to produce consistent, accurate, and policy-compliant answers. There is no manual modeling required. Kaelio learns from your dbt projects, BI tools, CRMs, and operational systems automatically.
What types of hallucination does a context layer prevent?
A context layer prevents five categories of hallucination: temporal hallucinations (applying current metric definitions to historical data), sensitivity violations (surfacing restricted data in AI responses), definition conflicts (choosing the wrong team's metric logic), dashboard logic gaps (missing calculated fields and filters that exist only in BI tools), and lineage blind spots (not understanding data provenance or transformation dependencies). Each of these failure modes is predictable and preventable with the right context infrastructure.
Sources
- https://docs.getdbt.com/docs/build/about-metricflow
- https://cloud.google.com/looker/docs/what-is-lookml
- https://cube.dev/
- https://www.atscale.com/
- https://www.tableau.com/learn/whitepapers/row-level-security-entitlements-tables
- https://atlan.com/data-catalog-vs-data-lineage/
- https://datahubproject.io/
- https://www.alation.com/
- https://arxiv.org/abs/2311.05232
- https://arxiv.org/abs/2204.00498
- https://docs.snowflake.com/en/user-guide/security-row-intro
- https://cloud.google.com/bigquery/docs/column-level-security-intro
- https://learn.microsoft.com/en-us/power-bi/enterprise/service-admin-rls
- https://cloud.google.com/looker/docs/model-permissions
- https://www.montecarlodata.com/
- https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
- https://platform.openai.com/docs/guides/structured-outputs
- https://platform.openai.com/docs/guides/function-calling
- https://www.nist.gov/artificial-intelligence/executive-order-safe-secure-and-trustworthy-artificial-intelligence
- https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai
- https://www.gartner.com/smarterwithgartner/how-to-improve-your-data-quality
- https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier
- https://kaelio.com