Skip to content

Markup: Wrap algorithms in <div> tags #11392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

jmdyck
Copy link
Contributor

@jmdyck jmdyck commented Jun 21, 2025

(I'm creating this in draft mode, because at this stage it's more of a discussion-starter than a solid proposal.)

In issue #10483, @domenic raises the possibility of wrapping the HTML spec's algorithms in <div algorithm> (as in Bikeshed specs). This PR does something approximating that.

The first commit is a minor markup change that was originally in PR #11379, but fell out when I withdrew the <hN>-related commits.

The second commit just adds some linebreaks, so that the main commit has a cleaner diff.

In the main commit, I add 2269 <div>...</div> pairs.

Each <div> start tag has one or both attributes:

  • var-scope
  • algo="..."

(11 have only var-scope, 1081 have only algo, 1177 have both.)

var-scope means: this div is a scope for <var> elements. (The main commit also inserts the var-scope attribute into 4 <p> tags and 1 <dd> tag, rather than introduce a <div> to hold the var-scope.)

algo means: this div contains some kind of algorithm, where the value of the attribute suggests the kind. (Occasionally, the value will show 2 kinds.)

Behavior for attributes and operations declared in Web IDL fragments:

    340  "idl:attr:regular/get"
     76  "idl:attr:regular/set"
     57  "idl:attr:regular/get,idl:attr:regular/set"
    184  "idl:attr:regular/reflect"
      1  "idl:attr:regular/act-like"
     10  "idl:attr:reflected/get"
      9  "idl:attr:reflected/set"
      1  "idl:attr:event-handler/get"
      1  "idl:attr:event-handler/set"
    238  "idl:op:regular"
     14  "idl:op:constructor"
      2  "idl:op:special:deleter:for-named-property"
     11  "idl:op:special:getter:for-indexed-property"
     10  "idl:op:special:getter:for-named-property"
      2  "idl:op:special:setter:for-indexed-property"
      2  "idl:op:special:setter:for-named-property"
      2  "idl:op:static"

The classic: an algorithm is defined with a unique name + ID:

    642  "define-procedure"

A named state of affairs exists if a given condition holds:

     37  "condition-holds-if"

Integration with JavaScript:

     13  "abstract op"
     21  "js:internal-method"
     14  "js:host-defined-op"

Some other spec declares the algorithm, this spec gives steps for it:

     15  "for-external-term"

(You could say that "js:host-defined-op" belongs here too.)

There's one 'abstract' declaration of the algorithm, but different kinds of object can have different definitions for the algorithm:

    154  "multiple-defns"
      3  "multiple-defns,condition-holds-if"

(You could say that "js:internal-method" belongs here too.)

When something occurs in the data model, the UA must perform some steps:

    125  "reaction"
      4  "reaction,define-procedure"

Parsing-related algorithms:

     80  "tokenization-state-behavior"
     24  "insertion-mode-rules"

See below:

      5  "main+subs"

Algorithms that appear in examples, which might or might not be 'serious':

      7  "appear in examples"

Abstract declarations of algorithms:

      4  "declare"

Things that are not actually algorithms, but I felt like identifying:

     33  "define-format"
     46  "define-struct"

I don't know how to categorize it:

     71  "?"

"main+subs" isn't a particular kind of algorithm, but rather a collection of algorithms, where there's one main algorithm that invokes others, but those others use variables defined in the main algorithm without taking them as parameters. (These need to be treated specially when it comes to checking for <var>s being defined.) One way to think of these is as as macros that get inlined at the invocation-points. However, I prefer to think of them as algorithms that are conceptually nested at some point within the main algorithm, and so can 'see' the variables defined at that point. (Presumably they appear after the main algorithm to make the main algorithm easier to read.)


/browsers.html ( diff )
/browsing-the-web.html ( diff )
/canvas.html ( diff )
/common-dom-interfaces.html ( diff )
/common-microsyntaxes.html ( diff )
/comms.html ( diff )
/custom-elements.html ( diff )
/dnd.html ( diff )
/document-lifecycle.html ( diff )
/document-sequences.html ( diff )
/dom.html ( diff )
/dynamic-markup-insertion.html ( diff )
/edits.html ( diff )
/embedded-content-other.html ( diff )
/embedded-content.html ( diff )
/form-control-infrastructure.html ( diff )
/form-elements.html ( diff )
/forms.html ( diff )
/grouping-content.html ( diff )
/iframe-embed-object.html ( diff )
/image-maps.html ( diff )
/imagebitmap-and-animations.html ( diff )
/images.html ( diff )
/index.html ( diff )
/infrastructure.html ( diff )
/input.html ( diff )
/interaction.html ( diff )
/interactive-elements.html ( diff )
/links.html ( diff )
/media.html ( diff )
/microdata.html ( diff )
/nav-history-apis.html ( diff )
/obsolete.html ( diff )
/parsing.html ( diff )
/popover.html ( diff )
/rendering.html ( diff )
/scripting.html ( diff )
/sections.html ( diff )
/semantics-other.html ( diff )
/semantics.html ( diff )
/server-sent-events.html ( diff )
/structured-data.html ( diff )
/syntax.html ( diff )
/system-state.html ( diff )
/tables.html ( diff )
/text-level-semantics.html ( diff )
/timers-and-user-prompts.html ( diff )
/urls-and-fetching.html ( diff )
/web-messaging.html ( diff )
/webappapis.html ( diff )
/webstorage.html ( diff )
/workers.html ( diff )
/worklets.html ( diff )
/xhtml.html ( diff )

@tabatkins
Copy link
Contributor

Out of curiosity, why use a different attribute marker to indicate an algorithm than what Bikeshed uses? There's certainly no, like, interop between the documents, but using identical patterns when possible reduces mental load for spec authors that touch multiple specs.

It looks like a major reason might be that about half the "algorithms" don't scope their variables, per your numbers. Can you expand on that?

@jmdyck
Copy link
Contributor Author

jmdyck commented Jun 24, 2025

Out of curiosity, why use a different attribute marker to indicate an algorithm than what Bikeshed uses?

Using the same marker might have suggested it has the same semantics, which I don't think it does (though there's a lot of overlap). One difference is that the (optional) value of Bikeshed's algorithm attribute is the name of the algorithm.

But certainly, if the HTML editors would like the spec to be more like Bikeshed in this regard, that can be done.

It looks like a major reason might be that about half the "algorithms" don't scope their variables, per your numbers. Can you expand on that?

It's not that they don't scope their variables, it's that they don't have variables to scope. E.g.

Although "a parallel queue" and "the parallel queue" are basically the declaration and use of a parameter, it doesn't use any <var> elements, so there's no need to establish a var-scope. (It might be nice to add a <var> to make that link explicit, in which case scanning for <div without a var-scope attribute will find candidates for such treatment.)

Other examples are where an algorithm's only use of (what you might think of as) a parameter is to say this element or <span>this</span>.

And then there are cases like in 2.6.1 Reflecting content attributes in IDL attributes, where "For a reflected target that is an element element, these are defined as follows:" introduces a <dl> of 4 algorithms (which refer to element). I put a <div var-scope> around the <p> + <dl> in order to capture the declaration of element, so the individual <div algo>s don't need var-scope. Mind you, some of those algorithms declare additional variables, which are presumably scoped to those algorithms, so you could argue that those algorithms should have var-scope. It depends on whether the editors want to support nested var-scopes.

One caveat is that I mostly ignore <var> elements within notes and examples, so there are probably spots where adding var-scope would make sense there. Or, e.g., if a note refers to variables of an algorithm, maybe it should be included in the var-scope of that algorithm (if it's adjacent).

@domenic
Copy link
Member

domenic commented Jun 24, 2025

My reaction is the same as @tabatkins. This is more than I anticipated in #10483 (comment). And I worry that trying to guarantee this level of fine-grained metadata going forward will be a burden on contributors.

Can we just use class="algorithm", without the fine-grained algorithm typing, or the two separate attributes? Related to your last message, I don't think there's extra value in distinguishing algorithms with no variables, versus algorithms with variables. The former is just a special case of the latter, where the number of variables equals zero.

Additional note: dff98e6 does not match our style guide. https://github.com/whatwg/html/blob/main/CONTRIBUTING.md#element-hierarchy

@jmdyck
Copy link
Contributor Author

jmdyck commented Jun 24, 2025

This is more than I anticipated in #10483 (comment).

It's certainly more than you described, and I knew that when I created the PR, but like I said, it's more of a discussion-starter than a solid proposal.

And I worry that trying to guarantee this level of fine-grained metadata going forward will be a burden on contributors.

Yup.

Can we just use class="algorithm", without the fine-grained algorithm typing, or the two separate attributes?

Certainly.

I'll explain more about why this draft has algo="..." attributes.

(1)
One of the problems I had was: how can I be fairly confident that I've found + tagged all the algorithms in the spec? E.g., based on the Infra spec, you'd think that to find all algorithms, all you have to do is search for sentences beginning with "To <dfn". That does find a lot, but only ~15% of the total.

Similarly, the Web IDL spec gives particular wording for defining behavior, but looking for just that wording will miss lots.

So one thing I did was read the spec's Web IDL fragments, to figure out what behavior-algorithms would need to appear elsewhere in the spec. So if the algorithms I had tagged so far didn't cover that, I would need to look further (i.e., look for other phrasings). The granular algo="idl:..." markup is a reflection of that checking.

(2)
I'm interested in parsing the spec's algorithms, and I thought I might need the value of the algo attribute to tell me how I needed to parse the content of the div. I've mostly convinced myself that this isn't the case, but not entirely.

(3)
There's still the question of what exactly should be marked up as an algorithm. So the algo="..." attributes will help stake out the boundary. If you decide that algo="foo" shouldn't be marked as an algorithm, then having the algo values (for now) make that a lot easier for you to say and for me to do.


Related to your last message, I don't think there's extra value in distinguishing algorithms with no variables, versus algorithms with variables. The former is just a special case of the latter, where the number of variables equals zero.

To be clear, I didn't introduce var-scope to distinguish no-var algorithms from some-var algorithms. (I agree that isn't a hugely useful distinction, and it's one you can pretty easily reconstruct anyway.) I introduced var-scope to be able to identify variable scopes that don't coincide with an algorithm. I described one of those (in 2.6.1) in my previous message. There are others.

So sure, we can say that every algorithm is implictly a variable scope, and so drop var-scope from any <div algo> that has it. But that will still leave at least 16 var-scopes (more if something tagged <div algo="..." var-scope> is deemed to not be an algorithm).


Additional note: dff98e6 does not match our style guide. https://github.com/whatwg/html/blob/main/CONTRIBUTING.md#element-hierarchy

Sure, but like I said, that commit's only there to make the main commit cleaner. (That is, in the main commit, every new div tag is on its own line, without disrupting any existing line. That made it easier for me to check that the main commit didn't have any unwanted changes in it.) I assume the commits will get squashed before merging, so if one commit introduces bad style and then another commit fixes it, is that a problem?

Or are you saying that the later commit doesn't fix it? Currently, the PR says

<li>
 <div ...>
 <p>The setter steps are to ...
 with the given value.</p>
 </div>
</li>

According to the style sheet, I guess this should be

<li><div ...><p>The setter steps ...
with the given value.</p></div></li>

because both the li and the div contain a single "block" element, but the status quo spec has zero cases where a div start-tag is preceded (on its line) by anything other than spaces (even when it's the only child of its parent), so I hadn't considered that as a possibility.

Or maybe you'd prefer

<li>
 <div ...><p>The setter steps ...
 with the given value.</p></div>
</li>

but then I have to ask if you'd like a similar treatment for other cases where the new div wraps a single element.

Of course, rather than introducing a div to wrap this algorithm, we could just mark the p as an algorithm. That's fine, it was just easier for me to use a new div uniformly.

@domenic
Copy link
Member

domenic commented Jun 25, 2025

Thanks for your detailed reply! I think I'm aligned on most points. To summarize:

  • We can use the fine-grained attributes during the review phase, even if we eventually end up with just class="algorithm".

  • There are at least some cases that are var-scopes but not algorithms. I'll try to look at those more closely to understand what we should do with them.

  • I was mistaken on dff98e6; in context it makes perfect sense.

Regarding the eventual end state, do you think class="algorithm", plus maybe class="var-scope", is sufficient? Or is there a case to be made for something more detailed? Again my main concern here is making it easy for future maintainers. However, if there are compelling tooling benefits to adding more detailed information, I'm happy to discuss.

Anyway, I'll try to do a more detailed review pass now!

@domenic
Copy link
Member

domenic commented Jun 25, 2025

So, the diff is huge, and not reviewable on GitHub. But reviewing it locally, the following things stood out to me from, like, the first 10%, plus some skipping around.

  • I am sad about how non-uniform our algorithm declaration style is.

  • For single-paragraph "algorithms", the most natural thing for people to do while authoring will be <p class="algorithm"> instead of <div class="algorithm">\n<p>. However, I kind of like the idea of uniformly using <div> everywhere, as it makes refactoring easier, and is easy to spot.

  • In Bikeshed, because the main use of the algorithm markup is variable highlighting, many single-sentence definitions would not necessarily be marked up. For example, "matches about:blank", or "valid URL potentially surrounded by spaces".

    • I kind of like categorizing these as algorithms, since they could be equivalently written as algorithms. If we named the parameter, and expand into a conditional step 1 which returns true, followed by a fallback step 2 which returns false, it's clearly algorithm-ey. And, that's what you would do in the implementation.

    • I do worry a bit that it will be hard for people to remember to mark up "definition-ish" algorithms in the future. Maybe clear Contributing.md documentation can help with this...

    • I found "CORS-same-origin", "CORS-cross-origin", and "internal response" which seem like they fit the criteria of definition-ish algorithm, and aren't marked up. Did you have a specific criteria you used to exclude these?

    • One particular messy instance is the difference between "week number of the last day" (marked up as not an algorithm, but has a <dfn>) and "has 53 weeks" (marked up as an algorithm/var-scope, but doesn't have a <dfn>). These are really just a reflection of the non-uniform messiness of the current spec, so I am not sure if there's anything to be done, besides note them as future possibilities for the infinite cleanup backlog.

  • Going through all the <div var-scope> (with no algo=""):

    • Most of them are used for reflection, where there's a set of distinct algorithms that "close over" one variable. The user activation cases are similar, and the "In the following CSS block" case is at least kind of similar. These seem like reasonable use cases to me, and the need for this is rare enough that I am not worried about contributor burden, as long as we document it in Contributing.md.

    • The use for "a serialization of the bitmap as a file" could probably just be thought of as an algorithm. I think it's basically an algorithm with one step and lots of requirements on how that step behaves.

  • We should not mark up define-struct, unless you think there is a compelling tooling-related reason. (Maybe in the future we should standardize struct definition markup, but I don't even know which of the possibilities I like best.)

  • define-format seems pretty algorithmey. Similar to other definition-algorithms, it could be refactored into an algorithm over an explicit variable.

  • Are there any <var>s outside of var-scope? If not, that could be a nice check to introduce into the build process eventually.

  • It does seem likely that we'll want to keep main+subs in some format, so that any JavaScript variable-highlighting or any sort of build process variable checking can treat them specially. (Are there others like this?)

  • Overall this has me itching for an Ecmarkup-like process for generating algorithm headers. Perhaps one day.

  • (Not a request for more work, just something I noticed that is related.) We've had some complaints before about how it's hard to link to algorithms without definitions.

    • One case I found is "Whenever an element including HTMLOrSVGElement becomes browsing-context connected, the user agent must execute the following steps on element". We've done piecemeal fixes to these sorts of things when requested. In this case it would look changing to "the user agent must execute the following HTMLOrSVGElement browsing-context connection steps on element:".

    • However, there are some cases where this doesn't make sense, like the adjacent "The [cloning steps] for elements that include HTMLOrSVGElement". (Your multiple-dfns cases.)

    • Maybe this whole problem doesn't matter as much these days since scroll-to-text-fragment exists?

    • The introduction of <div class="algorithm"> wrappers may, in the future, allow us to generate synthetic link markers here... although it would probably require adding id=""s as well.

@jmdyck
Copy link
Contributor Author

jmdyck commented Jun 25, 2025

There are at least some cases that are var-scopes but not algorithms. I'll try to look at those more closely to understand what we should do with them.

You looked at cases of <div var-scope>, but note there are also 4 <p var-scope> and 1 <dd var-scope>.

(Looking at the 3rd <p var-scope>, the one under :focus, I think should have instead create a <div algo> around the p + ul for "element has the focus".)

Regarding the eventual end state, do you think class="algorithm", plus maybe class="var-scope", is sufficient? Or is there a case to be made for something more detailed? Again my main concern here is making it easy for future maintainers. However, if there are compelling tooling benefits to adding more detailed information, I'm happy to discuss.

I'm pretty sure that just class="algorithm" and class="var-scope" will suffice to allow some useful analysis/tooling. Maybe once we have experience with that, we'll be in a better position to decide about something more detailed. I don't think I could make a strong case at this point for more detailed markup.

@tabatkins
Copy link
Contributor

I introduced var-scope to be able to identify variable scopes that don't coincide with an algorithm.

This makes sense, and is something I wouldn't mind introducing to Bikeshed as well. (And that name, var-scope, works for me.) Right now anything not in an algo is put in the global scope, which limits the ability of the typo-detection to work (copy-pasting a typo into two spots would cause Bikeshed to assume it's intended).

But in Bikeshed, at least, it would be in addition to the var scoping that happens automatically for algorithms. For ease of comparison, I think it would be a good idea to assume that algorithms automatically scope their variables. (We can worry about to what extent we need to care about marking up a shared variable "inheriting into" nested scopes later, if at all. This is mostly for human convenience, and lightly for linting and styling, so it doesn't necessarily need to be too precise.)

@jmdyck
Copy link
Contributor Author

jmdyck commented Jun 25, 2025

(Responding to just a couple points for now...)

I am sad about how non-uniform our algorithm declaration style is.

Indeed, the variety of syntax is discouraging. I imagine I'll suggest some consistification to make analysis easier, but it probably won't make much of a dent in the non-uniformity.

[...] single-sentence definitions [...]. For example, "matches about:blank", or "valid URL potentially surrounded by spaces".

I kind of like categorizing these as algorithms, since they could be equivalently written as algorithms. [...]

Yeah, that's what I thought too. More generally, they are rich in statically checkable phrases, so it would be a shame not to check them.

I do worry a bit that it will be hard for people to remember to mark up "definition-ish" algorithms in the future. Maybe clear Contributing.md documentation can help with this...

It may be difficult to draw the line between algorithm-y and non-algorithm-y definitions.

I found "CORS-same-origin", "CORS-cross-origin", and "internal response" which seem like they fit the criteria of definition-ish algorithm, and aren't marked up. Did you have a specific criteria you used to exclude these?

I think the specific reason was just that I didn't look for dfn-paragraphs with that form. E.g., I probably would have included CORS-same-origin if it had been phrased as:

"A response is CORS-same-origin if ..."

or

"To determine whether a response is CORS-same-origin, ..."

Background: I didn't want to step through the whole HTML spec, asking for each paragraph in turn whether it should be marked as an algorithm. So instead [among other things] I 'scraped' paragraphs from the spec and tried to identify patterns of phrasing that looked like algorithms, and then (check and) mark-up the paragraphs using those patterns. Which has the failure mode of leaving things out.

Anyway, I'll take another look.

One particular messy instance is the difference between "week number of the last day" (marked up as not an algorithm, but has a <dfn>) and "has 53 weeks" (marked up as an algorithm/var-scope, but doesn't have a <dfn>). These are really just a reflection of the non-uniform messiness of the current spec, so I am not sure if there's anything to be done, besides note them as future possibilities for the infinite cleanup backlog.

The "has 53 weeks" paragraph got my attention because of the <var>s. I'm pretty sure I looked at the "week number of the last day" paragraph and just bailed because it was so weird.


Are there any <var>s outside of var-scope?

Yes, I'm currently counting 84, ignoring examples, notes, and domintros. I think I only bothered creating a var-scope if there were at least two occurrences of the same variable.

Overall this has me itching for an Ecmarkup-like process for generating algorithm headers. Perhaps one day.

Right there with you, and I'd happily work on it, like I did for EcmaScript. That could conceivably be an alternative to this PR, since if you have a distinct enough syntax for algorithm headers, you might not need class="algorithm" in the source. But I'm guessing that's a much bigger project, and shouldn't forestall the benefits of class="algorithm" in the short term, even if it obsoletes it in the long term.

@jmdyck
Copy link
Contributor Author

jmdyck commented Jun 27, 2025

(responding to the rest of Domenic's comment...)

Going through all the <div var-scope> (with no algo=""):
Most of them are used for reflection, where there's a set of distinct algorithms that "close over" one variable. The user activation cases are similar, and the "In the following CSS block" case is at least kind of similar. These seem like reasonable use cases to me, and the need for this is rare enough that I am not worried about contributor burden, as long as we document it in Contributing.md.

Okay, cool.

The use for "a serialization of the bitmap as a file" could probably just be thought of as an algorithm. I think it's basically an algorithm with one step and lots of requirements on how that step behaves.

I did have it marked as an algorithm for a while, but it was just too weird.


We should not mark up define-struct, unless you think there is a compelling tooling-related reason.

There's definitely a tooling-related reason to have struct info handy. E.g., if you want to validate (type-check) set X's Y to Z, you might determine that X is a struct, in which case you need to confirm that that kind of struct has a Y member, and that its type is compatible with Z's.

However, I'm not sure that <div algo="define-struct"> (or similar) is a feasible way to collect the necessary info, so I'm okay with dropping that for this PR.

(Maybe in the future we should standardize struct definition markup, but I don't even know which of the possibilities I like best.)

Standardizing either the markup or the introductory wording would be good.


It does seem likely that we'll want to keep main+subs in some format, so that any JavaScript variable-highlighting or any sort of build process variable checking can treat them specially.

It occurs to me that <div algo="main+subs" var-scope> could maybe become just <div var-scope>, depending on how the <var>-handling code is written.

(Are there others like this?)

There might be other cases where main+subs could have applied. I don't think I was very thorough about how I found them.


We've had some complaints before about how it's hard to link to algorithms without definitions.

I understand what you mean, but I don't think I have much to add to your points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy