When Anthropic Started Doing Science, It Found That Data Infrastructure Is the Biggest Bottleneck

When Anthropic Started Doing Science, It Found That Data Infrastructure Is the Biggest Bottleneck

Last week, Anthropic published a research report titled Paving the Way for Agents in Biology. The team deployed multiple scientific AI agents — Claude, GPT, Biomni, and others — into virology databases like NCBI Virus to run sequence data retrieval experiments. The results were surprising: without an added deterministic retrieval layer, not a single model could reliably hit the accuracy threshold required to build a dependable dataset.

But the models weren’t the problem.

Anthropic’s analysis identified three systemic weaknesses in current biological database infrastructure: fragmented data, highly customized formats, and inconsistent interfaces. These databases were designed around how human researchers interact with information — not how AI agents programmatically query it. Once the team inserted a deterministic retrieval tool (gget virus) as an intermediary layer between the agents and the databases, accuracy jumped to nearly 100%.

The implications of this research reach far beyond biology. It exposes a structural tension that is accelerating: AI agents are becoming data consumers at unprecedented scale, but the data infrastructure they depend on was built for humans. That gap — wider in some fields, narrower in others — exists across every vertical.

An Infrastructure Gap Nobody’s Talking About

Pull the lens back from biology, and the contours of this problem become clearer.

For the past two decades, the architectural logic of internet data infrastructure has rested on a single core assumption: the primary consumers of data are human beings. Database query interfaces, data format standards, access authorization mechanisms — all of it was built around the benchmark of “how does a person look at this, how does a person operate it?”

When AI agents began arriving as large-scale users, that logic started breaking down.

Agents don’t need graphical interfaces. They don’t need pagination. They don’t need dropdown menus. What they need is structured data that can be retrieved reliably, data identity that can be verified, and interfaces that support high-frequency programmatic calls. When those capabilities are absent, agents do exactly what Anthropic’s experiment showed — they keep hitting walls inside a fragmented data maze.

This isn’t just biology’s problem. Financial databases, healthcare data platforms, government open data repositories — every database designed for human use is facing the same agent incompatibility problem.

MEMO’s Response — From Storage Layer to Agent Infrastructure

This structural tension is precisely what explains the evolution in MEMO’s positioning over the past year.

MEMO entered the market as a decentralized storage project. But as the technology developed, the team arrived at a clearer realization: storage is only one slice of the problem. What actually needs to be rebuilt is the entire technical stack through which AI agents access, verify, and consume data.

MEMO is currently building this agent-native infrastructure around three core capabilities.

Data DID: Giving Data an On-Chain Identity

One critical pain point that Anthropic’s experiment exposed was the agent’s inability to confirm whether retrieved data could be trusted. Who submitted a particular gene sequence? Has it been altered? What’s its version history? That information is scattered across disparate metadata systems, forcing agents to expend significant compute on backward verification.

MEMO’s Data DID protocol assigns every piece of data a unique on-chain identity. From the moment data is created, its origin, timestamp, update history, and reference relationships are recorded immutably on-chain. When an agent retrieves data, it simultaneously receives a complete, verifiable provenance chain — moving trust verification down into the infrastructure layer rather than leaving it as something the model has to repeatedly re-check on its own.

x402 + ERC-8004: A Two-Sided Market Designed for the Agent Economy

Current biological database operations are heavily dependent on government grants and institutional funding. Data is openly available but interfaces are outdated and inefficient. That model isn’t sustainable at agent-scale query volumes — not because costs blow up the budget, but because responsiveness can’t keep pace with call volume.

The x402 protocol provides an atomic, pay-per-use model for data consumption. Every time an agent calls a dataset, a micropayment is automatically processed. Database operators gain a direct economic incentive to maintain data quality and accessibility. The ERC-8004 delegated computation protocol addresses the data transfer efficiency bottleneck: rather than downloading full datasets locally for analysis, agents offload computation to nodes close to where the data is stored and receive only the results.

Together, these form a closed-loop, two-sided market between data providers and agent consumers. This is not just orders of magnitude more efficient than the legacy FTP-plus-static-page paradigm — more importantly, it provides the first viable economic framework for agents to consume data at scale.

Unified Addressing and Decentralized Storage: A Ground-Level Fix for Fragmentation

The data fragmentation problem Anthropic identified has a natural solution in a decentralized storage architecture. All data on the MEMO network is addressed through a unified protocol. Instead of facing hundreds of databases with incompatible formats and inconsistent interfaces, an agent faces a single, unified, programmable data plane.

From Biology to Every Field — A Universal Infrastructure Paradigm

Anthropic’s report is focused on biology, but its core argument applies to a much wider industrial landscape: databases need to be redesigned for agents as large-scale users.

This isn’t incremental improvement. It’s a paradigm shift at the infrastructure level.

Before the agent economy fully arrives, whoever builds AI-native data infrastructure first will control the critical intermediary layer between agents and data. That is exactly where MEMO is positioned: providing AI agents with a data layer that is trustworthy, queryable, and payable on demand — while giving data providers decentralized deployment and a revenue distribution mechanism.

Anthropic found a crack in the biological domain and patched it. MEMO’s goal is to rebuild the foundation for the agent era before that crack becomes a systemic collapse.

When a top AI research institution starts using experimental data to argue that infrastructure needs to be redone, the direction itself is no longer in dispute.

The only questions left are: who builds it, and how fast.

📢 Data Mining is Now Live — Earn Points Just by Leaving Your Browser Open

The DataDID plugin has launched its Data Mining feature. After installing the plugin, grant it permission to collect anonymized browsing data. Raw data is processed entirely locally and never uploaded; only proof of contribution is recorded on the blockchain via ZK Proofs. Users automatically earn points based on their data contribution. In a nutshell: Install the plugin, enable Data Mining, browse the web as usual, and watch your points grow automatically.

We invite you to try it out:
👉 [datadidapp.memolabs.net]