How to Connect AI Agents to Your Data Stack Without Giving Them Raw Database Access
At a glance
- Direct database credentials for AI agents create three problems at once: shared identity, bypassed access controls, and prompt-injection blast radius, all highlighted in the OWASP Top 10 for Large Language Model Applications.
- The safer pattern is a governed access layer between the agent and the warehouse, enforcing user identity, row and column policies, query allowlisting, and audit logging in line with the NIST AI Risk Management Framework and its Generative AI Profile.
- Model Context Protocol (MCP) gives any agent a standard way to call a governed server instead of querying the database directly.
- Existing controls do most of the work: Snowflake row access policies, BigQuery column-level security, and Databricks Unity Catalog row filters all enforce at query time.
- A governed context layer, exposed through MCP and backed by warehouse-native controls like Snowflake row access policies, is the implementation pattern that makes this architecture portable across many agents and tools.
- The pattern aligns with the NIST AI Risk Management Framework and its Generative AI Profile requirements for least privilege, traceability, and human oversight.
The OWASP Top 10 for Large Language Model Applications includes both prompt injection and excessive agency, which is why broad warehouse credentials are a poor fit for AI agents that touch sensitive data. Kaelio auto-builds a governed context layer from your data stack. Its built-in data agent (and any MCP-compatible agent) can then deliver trusted, sourced answers to every team.
Why Raw Database Access Fails Under Scrutiny
The "service account with broad permissions" pattern fails for three reasons that compound.
1. Shared Identity Erases User Permissions
Most warehouses enforce access at the user or role level. When an agent runs all queries as a single service identity, every user effectively sees the union of permissions across the organization. A junior analyst asking a casual question through Slack can surface board-level financials simply because the agent's account has access.
This is not a hypothetical edge case. It is the default behavior unless you actively prevent it. We covered the underlying control in how to enforce row-level security in AI analytics without rebuilding permissions.
2. Bypassed Row and Column Policies
Modern warehouses provide rich row-level and column-level access controls: Snowflake row access policies, BigQuery column-level security, and Databricks Unity Catalog row filters. These policies are evaluated at query time against the executing identity.
When the executing identity is a single service account, every policy that depends on the user is effectively neutered. Row policies that should hide other regions, mask other customers, or exclude PII fall through.
3. Prompt Injection Is a Privilege Escalation
A model that can write arbitrary SQL with broad credentials is a target. The OWASP Top 10 for LLM Applications explicitly calls out prompt injection and excessive agency as top risks. An attacker who can influence the prompt, directly or through indirect injection in a document the agent reads, can coerce the agent into running queries that the prompting user is not entitled to run.
Putting the warehouse credentials behind a governed access layer is a direct mitigation. The agent cannot leak what it cannot reach.
The Reference Architecture
The pattern that holds up under audit is straightforward. The agent never holds direct warehouse credentials. It calls a governed access layer that does five things on every request.
┌─────────────────┐ prompt ┌────────────────────────┐ governed query ┌──────────────┐
│ AI Agent │ ─────────► │ Governed Access Layer │ ─────────────────► │ Warehouse │
│ (Slack, app, │ │ (context + policies) │ │ (Snowflake, │
│ Claude, etc.) │ ◄───────── │ - user identity │ ◄───────────────── │ BigQuery, │
└─────────────────┘ answer + │ - row/column policy │ result + lineage │ Databricks)│
citations │ - definition lookup │ └──────────────┘
│ - query allowlist │
│ - audit log │
└────────────────────────┘
The five jobs of the governed access layer:
- Resolve the prompting user. Tokens, OAuth, or SAML pass the user identity from the agent's surface (Slack, web app, embedded chat) into the layer.
- Apply row and column policies. The layer issues queries as the user, or with the user's effective permissions, so warehouse-native policies still fire.
- Restrict to governed definitions. The layer ensures the SQL the agent generates resolves to canonical metric definitions and valid join paths, not arbitrary table scans.
- Allowlist query patterns. The layer rejects queries that fall outside the set of safe, governed patterns. Reads are scoped; writes are blocked unless explicitly enabled.
- Log everything. Every prompt, every generated query, every result, every citation, with user, timestamp, and version.
The agent's job shrinks. It generates intent, asks the layer, and explains the result. The layer is where security lives.
Building the Layer: Five Practical Steps
Step 1: Run Connectors Read-Only and Scoped
The connectors between your access layer and the warehouse should use read-only credentials, scoped to the schemas the layer is allowed to expose. If the agent's job is to answer business questions, it does not need access to admin tables, raw event firehoses, or PII fields the rest of the company cannot see.
Kaelio ships read-only connectors as the default across 900+ tools, including Snowflake, BigQuery, Databricks, and Redshift. Metadata is ingested; raw rows stay where they live.
Step 2: Map User Identity End-to-End
When a user prompts the agent in Slack, the layer must know who that user is in the warehouse. The minimum is OAuth-style identity propagation. The cleaner version is a user mapping table that resolves Slack, email, or SSO identity to warehouse roles.
The goal is simple: when the layer queries Snowflake on behalf of Alice, Snowflake sees Alice. Row access policies that depend on current_user() then evaluate correctly.
Step 3: Encode the Definitions and Joins the Agent Is Allowed to Use
The agent should not be free-styling SQL against your full schema. It should be resolving questions against governed definitions: "revenue" is a known metric, "active customer" is a known dimension, the join from orders to customers is a known path.
This is what a governed context layer does. It encodes the canonical metric definitions, valid join paths, deprecated tables, and business rules. The agent's generated SQL is constrained to those primitives. Anything outside the governed set requires a human in the loop.
Step 4: Apply a Query Allowlist and Cost Guards
Even with governed definitions, the layer should enforce a small number of hard rules:
- Reads only by default. Writes, schema changes, and DDL are blocked unless explicitly permitted.
- Query patterns within budget. Cost guards (row limits, time limits, scan limits) prevent runaway queries.
- No raw cross-tenant queries. In multi-tenant systems, every query must include the tenant filter.
This is the layer that catches prompt injection in practice. Even if the model is convinced to write a dangerous query, the allowlist refuses to run it.
Step 5: Log Prompts, Queries, Answers, and Citations
Every interaction needs an audit trail. At minimum: the prompt, the generated SQL, the result row count (or the result itself for sensitive cases), the citations the agent surfaced, the user identity, and a timestamp.
This serves three audiences. Security uses it to investigate. The data team uses it to identify recurring questions and improve definitions. Auditors use it to confirm that the system behaved as documented. For governed-data teams in regulated industries, this is non-negotiable. See the best AI analytics tools for governed data and the SOC 2 considerations for context.
Where MCP Fits
Model Context Protocol (MCP), originally created by Anthropic, is the open standard that makes this architecture portable.
Without MCP, every agent needs a custom integration to the access layer: a Slack bot, a Claude integration, a ChatGPT plugin, a custom in-house app each get their own connector. Each connector is its own security surface and its own bug surface.
With MCP, the access layer exposes tools (for example, query_metric, lookup_definition, explain_lineage) and any MCP-compatible agent can call them. Claude, ChatGPT, custom agents built with LangChain or CrewAI, and Kaelio's built-in agent all use the same governed interface. The security model is enforced once, at the layer, and inherited by every consumer. We expanded on this in the Model Context Protocol guide and in how to connect ChatGPT and Claude to governed business metrics.
How Kaelio Implements This Pattern
Kaelio is the governed access layer in this architecture. The platform follows the same five-step pattern described above:
- Connect. 900+ read-only connectors ingest metadata from warehouses, BI tools, transformation layers, and documentation. Raw data stays in place.
- Resolve identity. Kaelio maps the prompting user (in Slack, email, embedded chat, or via API) to warehouse roles. Queries execute with the user's effective permissions.
- Govern definitions. Kaelio auto-builds canonical metric definitions, lineage, and join paths from your existing dbt models, BI semantic models, and documentation. The data team reviews and approves.
- Restrict and allowlist. The agent's queries are constrained to governed primitives. Cost guards, read-only defaults, and pattern allowlists are enforced at the layer.
- Audit. Every prompt, query, and answer is logged with user, timestamp, citations, and reasoning.
Kaelio's built-in data agent uses this layer natively. Any MCP-compatible agent (Claude, ChatGPT, custom agents) calls the same layer through MCP or REST API. The security model is enforced once, regardless of which agent the user is talking to.
The architectural payoff: when you add a new agent surface tomorrow, you do not redo the security review. The new agent inherits the same governed access pattern.
A Short Checklist for Security Reviewers
If you are reviewing an AI agent integration, the questions worth asking are short.
- Does the agent hold direct warehouse credentials? If yes, stop here.
- Is user identity propagated end-to-end? Can you trace a prompt to a specific person?
- Do existing row and column policies fire when the agent queries? Verify with a test user.
- Are queries restricted to governed definitions and known join paths?
- Are reads allowed by default and writes blocked unless explicitly enabled?
- Are prompts, queries, answers, and citations logged with sufficient retention?
- Can the same governed layer serve multiple agent surfaces, or does each surface reimplement security?
A "yes" to all of these is the bar. Anything else is technical debt waiting for an incident.
FAQ
Why is giving an AI agent raw database access risky?
Raw database access typically means a single broad service account with permissions across many tables. The agent inherits those permissions for every user, so requests bypass row and column-level controls. It also exposes you to prompt injection attacks that can coerce the agent into running queries the user could not run directly. The OWASP Top 10 for LLM Applications lists this category of failure as one of the most common.
What is the safer pattern?
The safer pattern is a governed access layer between the agent and the warehouse. The agent queries the layer, not the database. The layer enforces user identity, applies row and column policies, restricts the agent to governed metric definitions and valid join paths, allowlists query patterns, and logs every prompt and answer. The agent never sees raw credentials.
How does Model Context Protocol (MCP) help?
MCP is an open standard that defines how AI agents discover and consume external context and tools. With MCP, an agent does not need direct database credentials. It calls a governed context server that resolves prompts to safe queries, enforces access, and returns sourced answers. Any MCP-compatible agent can use the same governed interface, so you implement security once and inherit it across Claude, ChatGPT, and any custom agent.
Do I need to rebuild my permission model?
No. A governed access layer should inherit existing warehouse and BI permissions. Snowflake row access policies, BigQuery column-level security, and Databricks Unity Catalog row filters all evaluate at query time. The layer maps the prompting user to those policies so the agent runs as the user, not as a service account with broader access.
How does Kaelio implement this pattern?
Kaelio auto-builds a governed context layer over your warehouse, BI tools, and documentation. AI agents (Kaelio's built-in data agent and any MCP-compatible agent such as Claude or ChatGPT) query the context layer rather than the database. The layer enforces user identity, applies row and column policies, restricts queries to canonical definitions, allowlists patterns, and logs every prompt and answer with citations.
Sources
- https://www.nist.gov/itl/ai-risk-management-framework
- https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf
- https://owasp.org/www-project-top-10-for-large-language-model-applications/
- https://modelcontextprotocol.io/
- https://www.anthropic.com/news/model-context-protocol
- https://docs.snowflake.com/en/user-guide/security-row
- https://cloud.google.com/bigquery/docs/column-level-security-intro
- https://docs.databricks.com/en/data-governance/unity-catalog/row-filters.html