Provable but Unqueryable

April 14, 2026

I wrote previously about QMDB — a verifiable database that collapses Merkle tree maintenance and key-value storage into a single system with O(1) I/O. After building a protocol on top of it, I’ve run into something benchmarks don’t capture: a database optimized for cryptographic proofs makes a poor query engine.

This is true of every merkleized state store — Ethereum’s MPT, Celestia’s NMT, QMDB’s append-only twigs. The key layout is designed to produce Merkle roots efficiently, not to answer arbitrary questions about the data.

What a merkleized KV store gives you

QMDB stores state as key-value pairs with prefix-byte namespacing. Every entity type gets a prefix, followed by identifiers:

[0x0A] [project_id:32]                     → project state
[0x0D] [project_id:32] [ref_name:variable] → ref state
[0x0E] [project_id:32] [commit_hash:32]    → commit state
[0x10] [owner:20] [target:32] [type:1]     → link (follow/star)

Lexicographic ordering within each namespace means prefix scans are fast — “list all refs for project X” is a single range scan over [0x0D | project_id]. Point lookups are O(1). The whole store merkleizes on every block, producing a state root any light client can verify.

The queries that are fast are the ones the key layout was designed for:

QueryKey patternCost
Get project by ID[0x0A | id]O(1)
List refs for projectscan [0x0D | project_id]O(refs in project)
List commits for projectscan [0x0E | project_id]O(commits in project)
Get link by owner+target[0x10 | owner | target | type]O(1)

The key is the index.

What it doesn’t give you

“Most starred projects.” Requires scanning every link entry (0x10), filtering for type Star, grouping by target, counting, and sorting. No secondary index on star count. The key layout groups links by owner, not target.

“Projects owned by address X, sorted by recent activity.” The project prefix (0x0A) groups by project_id, not owner. Activity timestamps are scattered across commit and ref entries in other prefixes.

In practice, listing an owner’s projects looks like this:

loop {
    let entries = store
        .scan_prefix_paginated(&prefix, &cursor, batch_size)
        .await?;

    for (key, value) in &entries {
        let project: ProjectState = serde_json::from_slice(value)?;

        // Client-side filtering — O(all projects)
        if !project_matches_owner_address(&project, &owner) {
            continue;
        }

        projects.push(build_project_response(store, &project_id, &project).await?);
    }
}

Every project gets deserialized and checked. The batch_size = limit * 4 heuristic assumes 1-in-4 match. An owner with 3 projects out of 100K triggers a full scan.

“Search projects by name.” The name index (0x0C) maps [owner_address | name]project_id. Exact match only. Substring and full-text search require scanning the entire namespace.

Any query that doesn’t align with the key layout degrades to a full scan.

The two-system compromise

The standard solution is to run two systems:

  graph TB
    subgraph Fresh["Consensus-fresh, limited queries"]
        direction LR
        U1["User"] --> G1["Gateway"] --> Node["QMDB (node)"] --> Q1["get project X\nlist refs for project Y"]
    end

    subgraph Rich["Eventually consistent, rich queries"]
        direction LR
        U2["User"] --> G2["Gateway"] --> Idx["Indexer (edge)"] --> Q2["full-text search\nsorted lists\naggregates"]
    end

    Node -.->|"block stream"| Idx

    style Node fill:#2A40FF,color:#fff
    style Idx fill:#3BB273,color:#fff

The node produces blocks. An ingest pipeline consumes them and writes into an indexer that handles joins, aggregates, full-text search, and arbitrary sorting. Every chain with a block explorer does this. Reads are split between the two backends depending on what the query requires:

Node (QMDB)Indexer
FreshnessConsensus-currentLags by ingest pipeline
QueriesPoint lookups + prefix scansFull SQL
ProofsMerkle proofs for any keyNone
Trust modelVerifiableTrust the indexer

Fresh, provable data with limited queries, or rich queries over stale, unverifiable data. The indexer lag might be milliseconds or minutes depending on load. You also maintain two data paths — two schemas, two query APIs, two failure modes — and the indexer must replicate every state transition the node makes.

For a human browsing an explorer, this compromise is usually acceptable. The indexer often catches up before the page loads. The browser also carries context across navigations, so a little inconsistency is tolerable.

The agent problem

Agents operating in thin execution environments — V8 isolates, WebAssembly sandboxes, edge runtimes — face a different set of constraints. They are not browsing; they are making a decision that may immediately trigger a write. Consider an MCP tool that needs to answer: “What are this user’s most active projects, and do I have permission to update the default branch on each one?” This touches four entity types: projects (filtered by owner), refs (recent activity), commits (activity metrics), and keys (permission checks). Under the two-system model, the agent must query the indexer for the rich query and the node for permission state, then reconcile responses from two systems at two different points in time.

This arrangement is awkward for reasons that are structural rather than merely operational. Each invocation begins from a cold start, with no connection pool, no local cache, and no memory of earlier reads. Memory budgets are typically limited to 128–256MB, which rules out loading a local QMDB replica or embedding an in-process indexer. The read, decision, and write phases are also compressed into a single request cycle, so stale indexer data can translate directly into an incorrect action: an agent may update a ref that has already moved, miss a permission revocation, or write to a project that was archived between the read and the write.

Transport and auditability constraints make the problem sharper. Most isolate runtimes expose only HTTP interfaces, often without h2 support, which rules out long-lived gRPC streams and turns each state read into a separate round trip through a REST gateway. More importantly, when an agent acts on behalf of a user, the user should be able to verify the state the agent observed when making that decision. An indexer result can be useful, but it does not carry a proof of correctness.

  graph LR
    Agent["Agent\n(V8 isolate)"] --> GW["REST gateway"]
    GW --> Node["QMDB node"]
    GW --> Idx["Indexer"]
    Node --> Fresh["fresh\npoint reads"]
    Idx --> Rich["stale\nrich queries"]

    style Agent fill:#E84855,color:#fff
    style Node fill:#2A40FF,color:#fff
    style Idx fill:#3BB273,color:#fff

We run an MCP server with 15 tools that query protocol state. Each tool makes HTTP calls through a REST gateway, which routes to either the node or the indexer depending on the query type. As a result, the tools inherit the limitations of both backends: the node cannot answer richer analytical questions, while the indexer may answer them from a stale view of state. What agents need instead is a single query surface that is expressive, fresh, and verifiable.

A third path: SQL over the state store

Apache DataFusion is an extensible query engine in Rust. It takes SQL, optimizes it into a physical execution plan, and runs it against any storage backend that implements TableProvider:

trait TableProvider: Debug + Sync + Send {
    fn schema(&self) -> Arc<Schema>;
    fn table_type(&self) -> TableType;
    async fn scan(
        &self,
        projection: Option<&Vec<usize>>,
        filters: &[Expr],
        limit: Option<usize>,
    ) -> Result<Arc<dyn ExecutionPlan>>;
}

You describe the column schema, DataFusion handles joins, aggregation, sorting, GROUP BY, HAVING, and window functions. Execution is streaming, vectorized, and multi-threaded over Apache Arrow’s columnar format.

The idea: implement TableProvider for each QMDB key prefix, register them as virtual tables, and let DataFusion translate SQL into QMDB range scans. This does not magically add new indexes, but it does provide one query surface over consensus-fresh state.

  graph LR
    SQL["SQL query\nJOIN + GROUP BY + ORDER BY"] --> DF["DataFusion\nplanner + executor"]
    DF --> P["Projects provider\nscan 0x0A"]
    DF --> L["Links provider\nscan 0x10"]
    P --> QMDB["QMDB"]
    L --> QMDB

    style SQL fill:#E84855,color:#fff
    style DF fill:#2A40FF,color:#fff
    style QMDB fill:#3BB273,color:#fff

A ProjectsTableProvider translates a scan into a QMDB prefix scan over 0x0A, deserializes each ProjectState, and emits Arrow RecordBatches with columns like project_id, owner_address, name, created_at, commit_count. DataFusion handles the join against LinksTableProvider (prefix 0x10), computes the aggregate, and sorts.

Filter pushdown lets DataFusion pass predicates like owner_address = X to the provider, which can narrow the QMDB scan range when the filter aligns with the key layout, or fall back to post-filtering when it doesn’t.

What this enables

The immediate benefit of this design is that it changes the trade-off between freshness and expressiveness. Queries such as “most starred projects” could be answered from the state that committed 200ms ago, rather than from an indexer view that may be tens of seconds behind. For an agent, that matters less as a performance improvement than as a correctness property: the decision is formed from the same state the protocol has most recently committed.

It also collapses multi-backend orchestration into a single request. Instead of querying one system for activity, another for permissions, and then reconciling the two, an MCP tool can submit one SQL query over a consistent snapshot of state and receive one response. This simplifies tool design, but more importantly it eliminates a class of race conditions introduced by mixing fresh point reads with stale aggregate reads.

Because the result rows are sourced directly from QMDB, they can also be paired with proofs for the underlying state entries at block N. That does not by itself prove aggregate completeness or ranking correctness; those remain open problems that require additional machinery. Even so, it moves verification closer to the query surface and provides a clearer basis for auditing what an agent actually observed.

More broadly, the architecture remains centered on a single state machine. There is no shadow indexer replaying transitions, no duplicated schema that must be kept in sync, and no ambiguity about whether the read path and the write path are consulting the same state. When a block commits, that state becomes queryable through the same system that will later prove it.

The hard parts

None of this removes the physical constraints imposed by the underlying key layout. DataFusion can push predicates down into a TableProvider, but if a predicate does not align with the QMDB prefix structure, the provider still has to scan the full namespace. A query such as “most starred projects” therefore remains a scan over all link entries. Arrow-based execution should make that scan materially faster than row-by-row JSON deserialization, but it does not make the scan disappear.

The absence of native secondary indexes creates a second limitation. In QMDB, the key layout is the index. If the system eventually needs an ordering on (star_count, project_id) or an efficient access path by owner and recent activity, those structures must be maintained explicitly as additional prefixes during state transitions. At that point the design begins to reintroduce, inside the state machine itself, some of the indexing logic that external indexers currently absorb.

There is also a concurrency question. The query layer reads from the same QMDB instance that consensus writes to, and block execution currently holds a mutex on the database. Long-running analytical queries could therefore interfere with block production unless reads are isolated from writes, perhaps through QMDB’s historical query support (OldId chains) or a related snapshotting mechanism. Finally, DataFusion is not a replacement for specialized information retrieval. It supports LIKE and regex predicates, but not inverted indexes or BM25-style ranking, so full-text search likely remains a separate concern.

Choosing the query surface

For human-facing applications, the conventional two-system architecture is often sufficient. Indexer lag is tolerable, the operational pattern is well understood, and a dedicated indexer still provides capabilities that a state-store query layer does not naturally replicate, especially around full-text search. If the user is browsing an explorer rather than issuing a state-dependent action, eventual consistency is usually an acceptable trade.

For heavier analytical workloads, open table formats offer a middle ground. R2’s Data Catalog provides a managed Apache Iceberg catalog, and R2 SQL adds a serverless query engine — also built on DataFusion — that runs directly against those tables without provisioning Spark or Trino clusters. State snapshots exported as Iceberg tables get partition pruning, column statistics, and distributed execution out of the box. The data is still eventually consistent, but the export format is open, the query engine is the same one proposed here for live state, and the infrastructure is serverless.

The argument for an embedded query layer becomes stronger in agentic settings. An agent making a permission-gated write from inside a sandboxed isolate, with a single request budget and no durable local state, cannot comfortably tolerate stale reads, cannot reliably reconcile two backends, and cannot ask the user to trust an unverifiable intermediate result. Those constraints are not edge cases; they are increasingly the default operating conditions for MCP tools and other systems that read protocol state before acting on it.

Today, light clients can verify individual keys through Merkle proofs, but they cannot yet verify aggregate claims such as “these are the top 10 starred projects” or “this agent had permission at block N” without additional proving machinery. A SQL layer over merkleized state does not solve that problem outright, but it provides a more plausible substrate for solving it. The relevant pieces already exist: QMDB provides storage and proofs, DataFusion provides planning and execution, and TableProvider offers a direct integration boundary between the two. The gap between provable state and queryable state is therefore narrower than the current architecture suggests.

Further reading

https://christopherw.xyz/atom.xml