Programmatically Undetermined

The scenario

Picture a common UI pattern: a dialog opens, a dark backdrop covers the page behind it, focus moves into the dialog and is trapped there. Every visual cue tells the user (a sighted user) that the background is off-limits. You cannot interact with it. This is a modal dialog in the fullest design sense of the term.

Now look at the markup. The dialog has role="dialog", an accessible name, correct focus management. But aria-modal="true" is absent. The developer omitted it, perhaps deliberately, perhaps not.

Does this pass WCAG?

I want to make the case that it doesn't, that the absence of aria-modal on a visually modal dialog is a meaningful candidate for a Success Criterion 1.3.1 failure. But to get there, I need to work through a counterargument that deserves to be taken seriously, and then say something about what "programmatically determined" should actually mean.

What SC 1.3.1 actually requires

Success Criterion 1.3.1, Info and Relationships, requires that information, structure, and relationships conveyed through presentation can be programmatically determined or are available in text.

The operative concept is relationships conveyed through presentation. In a modal dialog, the visual presentation conveys a specific relationship: the dialog is active and the rest of the page is not. That relationship isn't decorative. It carries functional meaning. A sighted user understands immediately that clicking outside the dialog does nothing, that the background content is inaccessible, that their context has narrowed to this one surface.

If that relationship is conveyed visually but not programmatically, 1.3.1 is the criterion it falls under. The question is whether it is, in fact, programmatically determined in the absence of aria-modal.

The `inert` counterargument, steelmanned

Here is the strongest case for the defense.

If the implementation places the dialog at the root of the body and wraps the remaining page content in a sibling container marked with the inert boolean attribute, then the background is genuinely, platform-level non-interactive. The browser enforces this. Focusable elements inside an inert subtree cannot receive focus. They are excluded from the accessibility tree. An assistive technology navigating the page will not reach them.

One could argue that this is sufficient programmatic determination. The relationship, that this content is inaccessible while the dialog is open, is encoded in the DOM. It is not merely visual. A sufficiently capable user agent could, in principle, expose this state. The information exists in the code. It is, by a reasonable reading, programmatically available.

This is not a frivolous argument. It is technically coherent, and I have seen experienced accessibility practitioners land here.

Why `inert` isn't enough

The problem is that inert is a general-purpose attribute. It tells the platform that a subtree is non-interactive. It does not say why.

A loading overlay makes background content inert. An inactive tab panel in a tabbed interface may use inert. An off-canvas navigation drawer, closed and slid out of view, may carry inert to prevent focus from reaching it. None of these are modal dialogs. In none of these cases would a user agent, a screen reader, an auditor, or a developer look at inert on a container and conclude: this implies a modal dialog is active.

The relationship between the dialog and the inert sibling is implicit. It requires inference. And inference is precisely what programmatic determination is supposed to eliminate. The point of encoding information in markup is that software can read it without guessing. inert on a sibling tells you something is inactive. It does not tell you that a modal dialog caused that inactivity, that the inactivity is scoped to this dialog's lifetime, or that a dialog-to-background relationship exists at all.

There is exactly one attribute on the web platform whose explicit semantic purpose is to encode the modal relationship between a dialog and its surrounding content. That attribute is aria-modal.

How WCAG itself uses the term

Before arguing for a stricter interpretation of "programmatically determined," it is worth looking at how WCAG already applies the term across its own success criteria. The term appears 17 times across the specification, in 9 distinct success criteria. The pattern of those appearances is revealing.

SC 3.1.1 and 3.1.2 require that the human language of a page or passage can be programmatically determined. In practice, this is universally understood to mean one thing: the lang attribute must be present and correct. Nobody argues that a screen reader could detect the language from the text itself and that this satisfies the criterion. The platform provides an explicit API for declaring language, and that API is the test.

SC 4.1.2 requires that the name and role of user interface components can be programmatically determined. Here too, the community reads this as: use the correct semantic HTML element or ARIA role. An AT inferring a button's role from its visual appearance or position does not satisfy the criterion. The API is the test.

SC 4.1.3 goes furthest in making this explicit. It requires that status messages can be programmatically determined "through role or properties." That phrasing is as close as WCAG gets to naming the mechanism directly. The criterion does not say "through any means from which an AT could infer status." It identifies the platform constructs designed for the purpose and holds authors to them.

SC 1.3.5 adds another dimension worth noting. It conditions programmatic determination on the technology having support for identifying the expected meaning. That is a direct acknowledgment within the spec that programmatic determination is only meaningful when a mechanism exists that was designed to carry that meaning. Applied to the modal case: inert was not designed to carry the modal relationship. aria-modal was.

Across every SC where the mechanism is well understood, the accessibility community already interprets "programmatically determined" to mean: use the API the platform built for this semantic. The inert-as-sufficient argument is not the norm. It is the exception, and it is inconsistent with how the same term is read everywhere else in the spec.

What "programmatically determined" should mean

WCAG's glossary defines "programmatically determined" as:

Determined by software from author-supplied data provided in a way that different user agents, including assistive technologies, can extract and present this information to users in different modalities.

The definition is identical across WCAG 2.0 and WCAG 2.2 — word for word, unchanged in over fifteen years. What has received far less attention than the definition itself are the two examples the spec provides:

Example 1: Determined in a markup language from elements and attributes that are accessed directly by commonly available assistive technology.

Example 2: Determined from technology-specific data structures in a non-markup language and exposed to assistive technology via an accessibility API that is supported by commonly available assistive technology.

Neither example describes inference. Neither describes a general-purpose mechanism whose side effects happen to be observable. Both describe explicit, purpose-built channels: attributes accessed directly by AT, or data structures exposed via a dedicated accessibility API. The spec's own illustrative examples already lean toward the platform-contract reading. The argument I'm making here is not against WCAG. It is drawn from it.

The phrase that does the heaviest lifting, and receives the least scrutiny, is "can extract and present." Can, under what conditions? With what level of AT support? As of which browser version? The definition, as written, is untestable in any stable way. AT behavior varies across products, versions, and platform combinations. A relationship that one screen reader surfaces correctly may be invisible to another.

I'd argue for a stricter, more principled reading: if the web platform has engineered an explicit API for a feature, conformance to that API is the test for programmatic determination. Not "could a capable AT infer this," since that bar shifts with every AT release. Not "is the information technically present in the DOM somewhere," since that proves too much. The test should be: does the markup use the API the platform built for exactly this purpose?

This reading has real advantages. It is stable, as APIs change slowly and deliberately. It is falsifiable, since either you used the API or you didn't. It places responsibility correctly: if an AT fails to consume a correctly implemented API, that is an AT bug, not an author conformance failure. And it respects the intent of the platform itself. When browser vendors and standards bodies define aria-modal, they are encoding a contract. That contract is the definition of programmatic determinability for the modal relationship.

There is one more distinction worth making explicit here, because it sharpens the contrast between inert and aria-modal considerably. HTML affects the DOM, and therefore affects all users uniformly — sighted users, keyboard users, and AT users alike. ARIA, by contrast, affects only the accessibility tree, and therefore affects only AT users. inert is an HTML attribute. When placed on a container, it makes that content non-interactive for everyone: keyboard focus cannot reach it, mouse interaction is blocked, and AT cannot navigate to it either. aria-modal is an ARIA attribute. It modifies only what AT perceives and communicates — it has no effect on the DOM, no effect on mouse interaction, no effect on visual rendering. An author who uses inert without aria-modal has addressed the DOM layer and left the AT layer unaddressed. These are not redundant mechanisms operating on the same surface. They operate on entirely different layers of the platform stack. Using one cannot substitute for the other.

The `aria-modal` irony

There is an uncomfortable wrinkle worth acknowledging directly, though it ultimately sharpens rather than undermines the article's argument.

aria-modal has a specific and separable dual purpose. The first is structural: it instructs assistive technologies to suppress background content from the accessibility tree and prevent the virtual cursor from escaping the dialog's boundary. The second is communicative: it informs the user, proactively, that they are in a modal context — that the world outside the dialog is temporarily unavailable and that normal navigation shortcuts will not take them there.

These two obligations are not the same, and current AT behavior treats them unevenly.

The ARIA spec, both at introduction in ARIA 1.1 and today, never required ATs to restrict virtual cursor navigation as a hard constraint. The language has always been SHOULD and MAY, not MUST. Preventing background navigation was always optional for AT implementors. What aria-modal normatively guarantees is the declaration of a semantic. What ATs do with it has always been implementation-dependent.

In practice, JAWS and NVDA now handle the structural side reasonably well across major browser pairings, according to a11ysupport.io data from August 2025. VoiceOver on iOS has improved. The remaining gaps are real: VoiceOver on macOS is partial, and Narrator, Orca, and TalkBack show no meaningful support. The overall picture is partial, better than a few years ago, but not resolved.

But the communicative side is where the deeper problem sits, and it is one that even well-supported ATs do not fully address. When a screen reader enters a dialog, it announces the dialog role and accessible name, something like "Confirm deletion, dialog." What it does not reliably announce is that the dialog is modal. The word "modal," or any equivalent signal, is not consistently surfaced. The user discovers the modal boundary by encountering it, by finding that their navigation shortcuts stop working, rather than being told about it in advance.

This matters because the entire value of semantic declaration is anticipation. When a screen reader announces "button," the user does not need to experiment to discover that Enter or Space will activate it. The role announcement equips the user to act. aria-modal should work the same way: a user who is told they have entered a modal dialog knows, without having to probe, that background content is off-limits and that their navigation scope has narrowed. An AT that restricts navigation without announcing modality is implementing the mechanical consequence of the attribute while ignoring its communicative purpose. It is the equivalent of a screen reader that correctly prevents a button from being activated by arrow keys but never tells the user it is a button.

This does not weaken the case for requiring aria-modal. It strengthens it in two directions. First, it means the attribute's communicative obligation is not yet reliably met by the AT ecosystem, which is a gap worth naming. Second, and more importantly for the article's central argument, it confirms that aria-modal carries a semantic that nothing else on the platform encodes. inert on a sibling tells the DOM that content is non-interactive, for everyone. It does not tell AT users anything about why, or that a modal relationship exists at all. The communicative layer, the announcement, the anticipation, the informed experience, depends entirely on aria-modal being present and correctly declared. There is no fallback.

An open conclusion

So: is a missing aria-modal a 1.3.1 violation?

Here is where I land. If we accept that SC 1.3.1 requires visually conveyed relationships to be programmatically determined, and if we accept that "programmatically determined" should mean conformance to the platform API purpose-built for a given semantic, then yes: a visual modal dialog without aria-modal fails to programmatically expose the modal relationship, and that is a 1.3.1 failure. The presence of inert on a sibling does not resolve this, because inert does not encode the relationship. It only encodes a consequence of it, and only at the DOM layer, for all users equally. The AT layer remains unaddressed.

But I want to be honest that this depends on a reading of "programmatically determined" that WCAG itself does not explicitly endorse. The spec's definition leaves enough room for the inert argument to stand. Reasonable practitioners disagree here, and the disagreement is legitimate.

What I'm more confident in is this: the ambiguity in WCAG's definition of "programmatically determined" is doing real harm. It produces inconsistent audits, inconsistent implementations, and inconsistent AT support expectations. The platform-contract interpretation I've outlined isn't just a convenient framing for the aria-modal case. It's a more rigorous standard that would make accessibility conformance more testable and more honest across the board. And the stakes of getting this interpretation right are not trivial. "Programmatically determined" is the third most referenced term across WCAG's entire glossary of 101 definitions, appearing in 9 success criteria spanning Level A through AAA. How we read those three words shapes how we audit, how we build, and how we think about what conformance actually means.

The question of whether aria-modal is required is, in the end, a question about what we think standards are for.