19 min read

We’ve Been Trying to Teach Machines to Read Since 2001. Here’s What Finally Worked.

Around 2001, Tim Berners-Lee published a paper in Scientific American that felt, at the time, less like a technical proposal and more like a promise. The web, he argued, could be far more than a collection of pages written for human eyes. With the right layer of structured metadata (expressed through RDF, OWL, and a family of related standards) machines would be able to navigate meaning, not just links. Software agents would move through this enriched web, booking appointments, finding research, connecting services, doing useful things on behalf of people. It was a compelling vision. For those of us working at the intersection of technology and human communication, it felt like the natural next chapter.

I was among the believers. The skeptics, to be honest, seemed to me at the time like people who had decided not to try. If the idea was right — and the idea was right — then the thing to do was to push forward and build it. The critics felt like a category of professional caution that occasionally mistakes inertia for wisdom.

With the benefit of hindsight, I owe those skeptics more credit than I gave them. Their objections were largely practical and, as it turned out, accurate. RDF, the basic triple-based data model at the core of the semantic web stack, was relatively approachable. OWL, however, was a different matter. It was formally grounded in description logics, a family of decidable fragments of first-order predicate logic. Its most expressive sublanguage, OWL Full, was technically undecidable, meaning no reasoning engine could perform complete reasoning over it, not as an engineering limitation but as a mathematical fact. The tools were difficult to use correctly and easy to use wrongly, and the inference engines that were supposed to draw conclusions from all this structured knowledge were slow by necessity, not just poor implementation. Reasoning over an OWL file with a few hundred concepts was already laborious by mid-2000s standards. Scaling that to billions of web pages was never remotely plausible.

And perhaps most fatally: the entire vision depended on an assumption that competing organizations, companies with genuinely opposed commercial interests, would voluntarily agree on shared vocabularies, annotate their content meticulously, and cooperate to build a commons from which they derived no obvious individual advantage. As one critic put it at the time, with some justice: the people who designed the semantic web apparently never read their epistemology texts.

But here is the part that I think both camps got wrong, and it matters for what comes later in this post. Even setting aside the practical obstacles, even imagining a world where every developer was enthusiastic and every company cooperative and every page perfectly annotated, there was still a ceiling. A machine querying that ideal semantic web could find a product tagged with schema:color "blue" but it couldn't know that "navy," "midnight," and "indigo" are also blue, or that "relaxed fit" implies something about proportion, or that a user asking "something warm for a Scottish winter" is really expressing a need about weight and wind resistance, not just temperature. The semantic web encoded labels. What was actually needed was something that could handle meaning, and those are not the same thing.

HTML5 arrived in the late 2000s carrying some of the same ambitions in more modest packaging. Semantic elements (<article>, <section>, <nav>, <aside>) gave developers a richer vocabulary for describing page structure. And the HTML5 document outline algorithm promised something genuinely useful: truly modular content where heading levels would be determined by structural nesting rather than hardcoded numbers, freeing developers and content management systems from the awkward business of tracking heading hierarchies across components. It was a neat idea. It was also, as Adrian Roselli and others have carefully documented, a wish that never became reality. No browser ever implemented the outline algorithm in its accessibility tree. Screen reader users, for whom proper heading structure is a primary navigation tool, experienced nothing but a flat sequence of h1 elements wherever developers had followed the spec in good faith. The algorithm was formally withdrawn from the HTML Living Standard in July 2022, roughly a decade after it had stopped being defended with any conviction. As Bruce Lawson wrote without ceremony: it had always been a wish, but never reality.

What the semantic web era did leave behind, despite everything, was a foundation. Schema.org, created partly by R.V. Guha who also created RSS and RDF, gave the web a practical, widely adopted vocabulary for structured data that search engines would actually use. Developers marked up recipes, events, products, and reviews not out of idealism but because it improved their search rankings. The motivation was unromantic. The result was useful. And as we will see, it turns out to be exactly the layer that a new generation of tools is now building on.

What Actually Changed

So what was actually missing? The honest answer is that the semantic web’s fundamental problem wasn’t just tooling complexity or developer reluctance or corporate non-cooperation, though all of those were real. The deeper issue was that the entire approach assumed meaning could be encoded by humans in advance, exhaustively and correctly, in a formal language. That assumption was always going to buckle under the weight of actual diverse human language.

What large language models introduced, and this is the part that tends to get lost in the general excitement about AI, wasn’t primarily the ability to chat. It was a fundamentally different way of storing and retrieving meaning. Instead of filing content under pre-agreed labels, LLMs learn from vast amounts of human language that certain words, concepts, and ideas consistently appear in similar contexts. That proximity gets encoded as a point in a very high-dimensional mathematical space, a vector. Things that mean similar things end up as neighbouring points. This is what embeddings are: meaning, expressed as geometry.

Think of it this way. The old semantic web approach was like a library where every book must be filed under an exact, pre-agreed category. If you ask for “books about losing someone you love,” the librarian can only help you if someone already created a category called exactly that and filed things under it consistently. “Grief,” “bereavement,” “loss,” and “mourning” would all be separate drawers with no connection between them unless a human had explicitly drawn that connection in advance. Embeddings work differently. The system learns, from the accumulated patterns of human language, that those words live in roughly the same neighborhood of meaning. When you ask a question, it doesn’t look for matching labels; it finds neighboring meaning. The catalogue emerges from the language itself, rather than being constructed by hand.

This is the gap that the semantic web was always trying to bridge and never could. And it is precisely what makes NLWeb interesting, not as a radical departure from what came before, but as the moment when a decades-old direction finally has the infrastructure it was waiting for.

NLWeb was introduced by Microsoft at Build 2025, conceived by R.V. Guha, the same person who created RSS, co-created RDF, and later built Schema.org. That lineage is not incidental. Guha has spent roughly thirty years working on the problem of making web content machine-readable, and NLWeb represents his current thinking on how to do it practically, at scale, today. The project’s own documentation states its goal plainly: “NLWeb is to MCP/A2A what HTML is to HTTP.” A content layer for the agentic web, in other words, built on the infrastructure layer underneath.

In practice, NLWeb works like this. A web publisher (a recipe site, an event listing platform, a product catalogue, a media archive) takes the structured data they already publish (Schema.org markup, RSS feeds, JSON-LD) and loads it into a vector database, generating embeddings in the process. NLWeb then exposes a simple API endpoint called ask, which accepts natural language queries. When a user or an AI agent sends a query, "find me a vegetarian dish that takes under 30 minutes," the query is converted to a vector, semantically similar content is retrieved from the database, and an LLM synthesises a response drawing on both the site's own data and the broader knowledge encoded in the model. The site's content doesn't just sit there waiting for keyword matches. It becomes queryable by meaning.

Early adopters include Shopify, Tripadvisor, Eventbrite, and O’Reilly Media, content-rich sites that had already invested in structured data. That last point is worth pausing on: NLWeb doesn’t require new markup conventions or new annotation schemas. It builds on the layer that the web standards community spent twenty years constructing, often thanklessly, often to skepticism. The semantic web camp and its critics were both, in some sense, vindicated simultaneously. The vision was right; the method was premature; the foundation that actually got built, practical, SEO-motivated, Schema.org-shaped, turned out to be exactly sufficient.

One further detail that matters for what follows: every NLWeb instance is also a Model Context Protocol (MCP) server. MCP, originally developed by Anthropic and now supported across the industry including by Microsoft and Google, is a standardized protocol that lets AI agents connect to external tools and data sources. An NLWeb-enabled site is therefore not just quarriable by humans through a conversational interface. It is automatically discoverable and usable by any AI agent in the growing MCP ecosystem. The web page becomes, in effect, a participant in a larger network of AI-accessible services. First, though, there is a parallel development worth examining, one that operates not at the content layer but inside the browser itself.

When the Browser Wars Grew Up

There is a certain irony in the fact that two of the companies most responsible for the browser wars of the 2000s are now co-authoring a W3C specification together. Microsoft and Google spent years competing on proprietary web extensions, each browser adding its own interpretation of standards, its own non-standard APIs, its own reason why developers needed to test everything twice. The Ajax revolution that Microsoft inadvertently sparked through Internet Explorer’s XMLHttpRequest object was genuinely innovative; it also became a case study in what happens when capability outruns standardization. Progress happened, but it was fragmented, inconsistent, and left a long tail of compatibility problems that web developers are still occasionally tripping over today.

Which is why it is worth noting, without over-reading it, that Microsoft and Google jointly published WebMCP in August 2025 under the W3C Web Machine Learning Community Group, a specification for a new browser-native JavaScript API, written collaboratively, from the outset, as an open standard. The browser wars didn’t end so much as the competitive terrain shifted. Both companies now have larger platforms to protect than any individual browser feature, and open standards serve those platforms better than fragmentation does. That is a pragmatic observation, not a cynical one. The result, regardless of motivation, is a specification that deserves attention.

WebMCP is distinct from MCP in an important way. MCP is a backend protocol; it connects AI agents to external servers, databases, and tools running somewhere on the internet. WebMCP operates inside the browser, in the page itself. Its core idea is straightforward: web developers can use a new JavaScript API to expose their application’s own functionality as “tools,” named functions with natural language descriptions, that AI agents, browser assistants, and assistive technologies can discover and invoke directly.

The API looks like this in practice:

navigator.modelContext.registerTool({
  name: "filter\_products",
  description: "Filter the product listing by category, price range, or availability",
  inputSchema: {
    type: "object",
    properties: {
      category: { type: "string" },
      maxPrice: { type: "number" },
      inStockOnly: { type: "boolean" }
    }
  },
  execute: async ({ category, maxPrice, inStockOnly }) => {
    return applyFilters(category, maxPrice, inStockOnly);
  }
});

A page that registers this tool is telling any compatible agent: here is something useful this page can do, here is how to ask for it in plain language, and here is the structure of what you need to provide. The agent doesn’t need to scrape the DOM, simulate clicks, or reverse-engineer the interface. The functionality is declared, described, and directly callable.

For accessibility practitioners, one sentence in the WebMCP specification is worth reading twice. The stated purpose of the API is to expose web application functionality to “AI agents, browser assistants, and assistive technologies.” Assistive technologies are not an afterthought in the framing. They are named in the same breath as AI agents as a primary intended consumer of these tools. The specification also includes a dedicated accessibility considerations section, which is, frankly, more than many web specifications manage.

The declarative form of the API makes this even more concrete. Rather than registering tools imperatively through JavaScript, developers can annotate existing HTML forms directly, allowing the browser to synthesise tool definitions from form structure and field labels automatically. A well-structured, semantically marked-up form, the kind accessibility practitioners have been asking developers to build for twenty-five years, becomes, without any additional work, a tool that assistive technologies and AI agents can use programmatically. Semantic HTML, it turns out, was good advice all along. It just needed a sufficiently compelling reason for the broader development community to follow it.

What This Means for People

All of the above is interesting technically. What matters is whether any of it makes a difference that would eventually impact people.

Léonie Watson is a Director at TetraLogical, Chair of the W3C Board of Directors, co-Chair of the W3C Web Applications Working Group, and a screen reader user who has spent decades working at the intersection of web standards and lived accessibility experience. She is not someone who dismisses compliance frameworks lightly; she helped build several of them. She is also not someone who mistakes the framework for the goal. In August 2025 she published a piece on the agentic web that is worth reading in full, but one passage in particular captures something the accessibility community has been circling for years without quite saying plainly.

She describes the experience of shopping for clothes online as a blind user. Not the experience of encountering a broken carousel or a missing alt attribute, but the more fundamental problem that even well-built retail sites don’t provide enough descriptive information to let a blind user make a confident purchase decision. A product listed as “long sleeve knitted jumper, crew neck, black, 100% cotton” leaves unanswered questions that a sighted user resolves by looking at the image: is it ribbed or plain? Does it fall to the hip or the thigh? Is it lightweight or chunky? The result, as she puts it, is online clothes shopping as “an exercise in buy six things and return five.” This is not a WCAG failure. It is a gap that conformance testing was never designed to catch.

What she found genuinely useful was not an AI tool that automatically generated alt text, a common and often shallow application of AI to accessibility, but an agentic interface that let her have a conversation with a shopping platform: ask for product descriptions, request specific details, filter results by her own criteria, and ultimately add an item to her basket, all through natural language. The interface removed the navigation layer entirely. She didn’t need to locate the right filter, find the right input, tab through the right controls. She described what she wanted and the agent handled the rest.

This is the distinction worth dwelling on. The accessibility community has spent considerable effort, rightly, ensuring that interfaces are navigable: that controls are keyboard-accessible, that focus order is logical, that landmarks are present, that ARIA roles are correctly applied. That work is necessary and it should continue. But navigability and usability are not the same thing. A blind user navigating a retail website without assistance is managing two simultaneous cognitive loads: understanding the content and operating the interface. Agentic interfaces, when they work, collapse those two tasks into one. You express what you want. The agent handles how to get it.

This matters differently for different groups. For users with cognitive disabilities, complex multi-step interfaces (registration flows, checkout sequences, filtering systems) represent a barrier that no amount of correct ARIA labelling fully removes. Natural language interaction lowers that barrier in a way that structural markup alone cannot. For users with motor impairments who rely on voice input or switch access, every interaction that removes a navigation step is a real reduction in effort and fatigue. The potential here is not marginal.

It is also, to be honest, not guaranteed. Three tensions are worth naming directly.

The first is the compliance problem. WCAG is built around deterministic, testable criteria: a piece of content either has a text alternative or it does not, a form field either has a label or it does not. Agentic interfaces generate responses dynamically, and no two responses to the same query are identical. As Léonie Watson observed: when content is non-deterministic, the concept of a representative sample, the foundation of any accessibility audit, starts to dissolve. This is not a reason to reject agentic interfaces. It is a reason the accessibility community needs to be part of defining what “accessible enough” means in this context, before that definition gets set by people who aren’t thinking about it.

The second is what might be called the structured-data gap. NLWeb works best on sites that have already invested in Schema.org markup, RSS, and semantic structure. Sites that haven’t, which are disproportionately smaller organisations, under-resourced services, and sectors that have historically been slow on accessibility, will be less discoverable by agentic interfaces. Semantic debt and accessibility debt tend to accumulate in the same places. If the agentic web primarily serves the sites that were already well-built, it risks compounding existing inequities rather than reducing them. That said, there are encouraging signs that platforms are moving to close this gap by default: Wix President and COO Nir Zohar recently announced that Wix has integrated NLWeb directly into its dashboard, meaning that sites built on the platform become AI-agent-readable without any additional implementation work by their owners. When infrastructure decisions get made at the platform level, the barrier for smaller publishers drops considerably.

The third concerns WebMCP specifically. The ability to expose application functionality as named tools is powerful, but it introduces new questions: which tools get registered, with what descriptions, and with what scope? A developer who registers a “submit form” tool but omits a “clear form” tool has made an accessibility decision, whether they intended to or not. The WebMCP specification includes a readOnlyHint annotation and abort mechanisms that gesture toward user control, but the accessibility implications of tool design will need sustained community attention, not just a section in a spec.

Looking Forward to the Past

There is a pattern worth recognizing here, because we have been in a version of this moment before.

In the early 2000s, the people building the accessibility standards that govern the web today were doing so largely outside the rooms where the web’s future was being decided. Browser vendors, platform companies, and developer communities were moving fast in directions that occasionally remembered accessibility and more often didn’t. The standards that emerged from that era, WCAG, ARIA, the semantic elements of HTML, were hard-won precisely because they required sustained, knowledgeable advocacy from people who understood both the technical landscape and what was actually at stake for users. Those standards are imperfect. They are also the reason that screen reader users can navigate the web at all, that keyboard users have predictable focus management, that people with low vision have contrast requirements baked into procurement criteria globally. The work was worth doing, even when it was slow, even when it was ignored.

Something similar is happening now, at a different layer of the stack. The W3C AI Agent Protocol Community Group was established with a mandate to develop open, interoperable protocols that enable AI agents to discover, identify, and collaborate across the web, covering inter-agent communication, identity models, security, and protocol interoperability. WebMCP is being developed under the W3C Web Machine Learning Community Group. These are not fringe projects or vendor experiments. They are the early stages of standards that will shape how the agentic web works, for everyone, for a long time. The WebMCP specification already has an accessibility considerations section, which is a reasonable start. It is also, on its own, not enough.

The accessibility community’s relationship with AI is complicated at the moment, and that complication is understandable. Some early applications of AI to accessibility were genuinely naive: overlay products that promised to solve structural inaccessibility algorithmically, automated alt text that described images confidently and inaccurately, tools that generated WCAG conformance claims without meaningful testing. The skepticism those efforts earned was proportionate. The problem is when that proportionate skepticism calcifies into a general posture of suspicion toward anything involving AI, which is a different thing entirely and less useful.

WCAG is a remarkable achievement. It is also, at its core, a means to an end. The end is that people with different abilities can participate fully in digital life: shop, bank, work, learn, connect, and do the thousand ordinary things that the web makes possible. If new technology genuinely advances that goal, it deserves honest evaluation on those terms, not reflexive caution dressed up as principled rigor. And if it falls short, or introduces new barriers, that too deserves honest evaluation, not dismissal of the whole direction.

The agentic web is being built right now, by people, in working groups, in GitHub repositories, in specification drafts. The question of whether it ends up more or less accessible to people with disabilities is not settled. It will be shaped by who participates in those processes, what expertise they bring, and whether the community that has spent decades understanding what inclusion actually requires shows up to make the case. We’ve seen what happens when it doesn’t. We’ve also seen what happens when it does.