Home

Day 11 of #100DaysOfSpec: 2.4 Common microsyntaxes, contd.

I am reading and taking notes on the HTML specifications for 100 days as part of #The100DayProject. Read the initial intent/backstory. I am a Microsoft employee but all opinions, comments, etc on this site are my own. I do not speak on behalf of my employer, and thus no comments should be taken as representative of Microsoft's official opinion of the spec. Subsections not listed below were read without comment.

Currently reading in 2.4 Common microsyntaxes

2.4.5.4 Times

Like the microformats yesterday, a "time" still doesn't contain a time zone, just hour, minute, second, and fraction of a second.

Hours are given in military time (0 <= hour <= 23). By the way, I always find it interesting when I come across an American who has their phone clock set to the 24-hour system.

2.4.5.5 Floating dates and times

A "floating date and time" contains both the date and time information.

When it becomes a "valid normalized floating date and time string", it does not contain a U+0020 SPACE character (which a "valid floating date and time string" can have), and the time is "expressed as the shortest possible string for the given time (e.g. omitting the seconds component entirely if the given time is zero seconds past the minute)".

2.4.5.6 Time zones

Re, time zone offsets: "This format allows for time-zone offsets from -23:59 to +23:59. In practice, however, right now the range of offsets of actual time zones is -12:00 to +14:00, and the minutes component of offsets of actual time zones is always either 00, 30, or 45. There is no guarantee that this will remain so forever, however; time zones are changed by countries at will and do not follow a standard."

Day 10 of #100DaysOfSpec: 2.4 Common Microsyntaxes, contd.

I am reading and taking notes on the HTML specifications for 100 days as part of #The100DayProject. Read the initial intent/backstory. I am a Microsoft employee but all opinions, comments, etc on this site are my own. I do not speak on behalf of my employer, and thus no comments should be taken as representative of Microsoft's official opinion of the spec. Subsections not listed below were read without comment.

Currently reading in 2.4 Common microsyntaxes

2.4.5 Dates and times

Surprise, surprise, we're using the Gregorian (standard Western) calendar. The spec actually—accurately—notes that that decision stems from "cultural biases".

Libraries that parse dates have to be reviewed to make sure they conform to the spec, which is more specific than ISO8601 formats, for example.

2.4.5.1 Months

A valid "month" string holds no information about time zones. It does, however, include the year. I would have assumed the browser parsed year, month, and day separately.

The spec says that two ASCII digits are used to denote the month portion of the month string (four for the year), but confusingly wherever in the spec it mentions months it does not include leading zeroes.

2.4.5.2 Dates

A valid "date" string contains the "month" string (which itself includes year and month), and a date string. It also does not hold any time zone information.

What's unclear to me is the difference between a "date string" and a "date component".

2.4.5.3 Yearless dates

Here's where things get interesting (non-web-nerds whisper, "you sure?")! You can have a valid "yearless date" string that contains just the month portion of the month string, and the day string. So a month string itself fails without a year, but a month string, without a year portion, as part of a date string is ok?

When you don't specify a year, Leap Day is allowed.

Day 9 of #100DaysOfSpec: 2.4 Common Microsyntaxes, contd.

I am reading and taking notes on the HTML specifications for 100 days as part of #The100DayProject. Read the initial intent/backstory. I am a Microsoft employee but all opinions, comments, etc on this site are my own. I do not speak on behalf of my employer, and thus no comments should be taken as representative of Microsoft's official opinion of the spec. Subsections not listed below were read without comment.

The big takeaway from this section as a whole, for me, is seeing how much is involved in handling numbers.

2.4.4.3 Floating-point numbers

I had to looking up floating-point numbers because the spec assumes some familiarity with it. I either have never heard the term "significand" or have forgotten it from a math class long ago; there was also no clear-cut example to visualize. Wikipedia (sorry) has more info on floating points.

2.4.4.4 Percentages and lengths

Confused as to why the algorithm for "parsing dimension values" would return numbers >= 1.0, or an error. For lengths I think that makes sense (for example, a text value has characters or it doesn't), but wouldn't there be a need for %s smaller than 1? I think I'm misunderstanding something about how this algorithm would be used or what it's supposed to measure. Need to start tagging these notes with "TO-ASK" ;]

Day 8 of #100DaysOfSpec: 2.4 Common Microsyntaxes

I am reading and taking notes on the HTML specifications for 100 days as part of #The100DayProject. Read the initial intent/backstory. I am a Microsoft employee but all opinions, comments, etc on this site are my own. I do not speak on behalf of my employer, and thus no comments should be taken as representative of Microsoft's official opinion of the spec. Subsections not listed below were read without comment.

Today's reading demonstrates how UAs should handle and format different types of data.

2.4.1 Common parser idioms

So many space characters: U+0020 SPACE, "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), and "CR" (U+000D). Having trouble figuring out what these different spaces actually are...

2.4.2 Boolean attributes

You can't have "true" and "false" on a boolean attribute. It is simply whether the attribute exists or not. If the attribute does exist, its value (when checking for expected/standard values) is case-sensitive and can't have leading or trailing whitespace.

2.4.3 Keywords and enumerated attributes

Enumerated attributes can take as their value a keyword from a specified list. Each of those keywords represents a "state". There can be some overlaps as to which has which state; non-conforming values so old sites don't break; and two default values: an "invalid" default and a "missing value" default.

Again, in order to match an expected value, the keyword is case-sensitive and can't have extraneous white spaces.

"The empty string can be a valid keyword."

2.4.4 Numbers

The spec is very specific about what a negative number is.

Day 7 of #100DaysOfSpec: 2.3 Case-sensitivity and string comparison

I am reading and taking notes on the HTML specifications for 100 days as part of #The100DayProject. Read the initial intent/backstory. I am a Microsoft employee but all opinions, comments, etc on this site are my own. I do not speak on behalf of my employer, and thus no comments should be taken as representative of Microsoft's official opinion of the spec. Subsections not listed below were read without comment.

I don't have much to say on today's section (other than I now know the ASCII uppercase letters range from U+0041 to U+005A, and lowercase letters range from U+0061 to U+007A), so I guess I'll comment on progress so far:

  1. I've already skipped a day (last Friday). Oops.
  2. At some point last week I was thinking "it's possible I might actually finish before the 100 days are up!" Meh, not feeling that way now. But I did start thinking about what I'd read after finishing the HTML W3C recommendation. My first thought was reading the CSS spec, but apparently CSS3 doesn't exist as a single document (possibly because, as I read in another section of this HTML spec, that CSS support as a whole is not required of browsers). Rather, there appears to be separate documents covering different topics in CSS. I did see Nerd Twitter get excited over a new draft of the SVG spec...
  3. I'm somewhat regretting that I did not choose something visual for my 100-day project. It's been really fun seeing everyone else's work come in, and I'm a little jealous that my project isn't as sexy from a designer's perspective. This comes back to the classic problem of web designers (or anyone whose work straddles two disciplines): push to be really great at aesthetics? Work to improve dev skills? Try to be perfect at all the things?!

Day 6 of #100DaysOfSpec: 2.2 Conformance requirements, contd.

I am reading and taking notes on the HTML specifications for 100 days as part of #The100DayProject. Read the initial intent/backstory. I am a Microsoft employee but all opinions, comments, etc on this site are my own. I do not speak on behalf of my employer, and thus no comments should be taken as representative of Microsoft's official opinion of the spec. Subsections not listed below were read without comment.

The spec discourages writing vendor-specific extensions, because, as we can see from history, the web works better when we work together (I CRINGE from the after-school-special nature of that last sentence, but I stand by the sentiment, dangit).

The W3C gives guidelines for when extensions absolutely MUST be used for experimental features: if you just need XML for markup features (and not HTML), you define a custom namespace for that feature.

If you want something new in the HTML syntax, "extensions should be limited to new attributes of the form 'x-vendor-feature', where vendor is a short string that identifies the vendor responsible for the extension, and feature is the name of the feature." Browsers and other user agents can't just come up with new HTML elements; that would be a major threat to interoperability (things all working the same, cross-browser). New experimental attributes have to have a "x-vendor" prefix so they don't crash with other browsers' implementations in the future, until there's a new standard (things we see with CSS prefixes). Attributes with "x-" or "_" in the name will never be an official part of the HTML language.

DOM extensions, such as methods, need to be vendor prefixed as well.

No extension can contradict or be non-conformant to the spec.

When an extension is needed cross-browser, UAs can go through the process of getting it added to the official spec or have a new spec written (in practice, becomes an applicable specification).

"User agents must treat elements and attributes that they do not understand as semantically neutral."

Day 4 of #100DaysOfSpec: 2.2 Conformance requirements, contd.

I am reading and taking notes on the HTML specifications for 100 days as part of #The100DayProject. Read the initial intent/backstory. I am a Microsoft employee but all opinions, comments, etc on this site are my own. I do not speak on behalf of my employer, and thus no comments should be taken as representative of Microsoft's official opinion of the spec. Subsections not listed below were read without comment.

Today's section summarizes all the sub-specifications the HTML spec relies on, and what terms are defined in those specifications.

Particularly interesting term lists (to me, anyway) include parts of URLs, CORS (cross-origin resource sharing), DOM features, and DOMException types.

"Web IDL" is referenced here. TIL that "IDL" stands for "interface definition language". According to the W3C recommendation on Web IDL, it is "an IDL variant with a number of features that allow the behavior of common script objects in the web platform to be specified more readily." This is still a bit unclear to me, seems like a library to help user agents wire up functionality more easily and consistently, but I think this is a can of worms to open another time. Unless you, dear hypothetical reader, have some insights. ;]

We're not really even getting to the meat of the spec yet but from the list of DOM features you already get a sense of just how much is under the surface of our prettily-packaged elements and properties. The comment interface is listed in these features; I wonder how many lines of code it takes from the UA just to have HTML comments work correctly.

"UIEvent" seems as though it would encompass quite a bit.

Note to self: figure out what a "Blob" actually is. I hear this word being thrown around and smile and nod to myself. W3C says "immutable raw binary data". Cool, very helpful. To read later.

QUITE CURIOUSLY, user agents are not required to support CSS "as a whole", but they do have to support the Media Query language.

And finally it should be mentioned that user agents aren't actually required to support CSS, Javascript, or HTTP, even within the UA subcategory of web browsers and other interactive UAs. These are simply the most ubiquitous choices for styling, scripting, and network protocols.

Day 3 of #100DaysOfSpec: 2.2 Conformance Requirements

I am reading and taking notes on the HTML specifications for 100 days as part of #The100DayProject. Read the initial intent/backstory. I am a Microsoft employee but all opinions, comments, etc on this site are my own. I do not speak on behalf of my employer, and thus no comments should be taken as representative of Microsoft's official opinion of the spec. Subsections not listed below were read without comment.

Today I realized the "Add developer-view styles" not only does some stylistic things, it also hides entire chunks of content. So I guess that's useful, if you want to ignore the information for user agents. For my purposes, I'm reading everything.

2.2.1 Conformance classes

"User agents are not free to handle non-conformant documents as they please". Sites gotta fail consistently!

The spec's list of the categories of user agents:

  • Web browsers and other interactive user agents
  • Non-interactive presentation user agents
    • Render HTML/XHTML docs w/o interactive capabilities
    • Same conformance rules as web browsers, etc, except for the interactive bits
  • Visual user agents that support the suggested default rendering
    • Can be one of the above two categories, but supporting the default is not required
    • This section talks about overriding some parts of the default rendering to make things more accessible the user. It seems odd that that wouldn't just be baked into the suggested rendering (legacy issue? Political issue?).
  • User agents with no scripting support
  • Conformance checkers
    • "Automated conformance checkers are exempt from detecting errors that require interpretation of the author's intent."
  • Data mining tools
    • If you're not rendering a document, your requirements are centered around semantics.
  • Authoring tools and markup generators
    • Obviously, these need to spit out conforming documents.
    • WYSIWYGs are included in this category, which is interesting because it is very possible (speaking as a person who has built many Wordpress sites for clients) for an individual to do some bananas stuff in a WYSIWYG editor.

Day 2 of #100DaysOfSpec: Common Infrastructure

I am reading and taking notes on the HTML specifications for 100 days as part of #The100DayProject. Read the initial intent/backstory. I am a Microsoft employee but all opinions, comments, etc on this site are my own. I do not speak on behalf of my employer, and thus no comments should be taken as representative of Microsoft's official opinion of the spec. Subsections not listed below were read without comment.

2 Common Infrastructure

When writing up documentation, there's a (valid) temptation to write a different version of the document for each distinctive user type. The authors of the HTML spec probably don't have that luxury—i.e. it's trouble enough getting everyone to agree on just one document—so the HTML spec covers the concerns of both web devs working on websites, and browsers that need to adhere to the new standards.

It probably does make sense, though, to pay attention to what the other side needs to do. Web browsers can better design for how features are intended to be used; web developers can better grasp how things are working under the hood, and what the responsibilities of the web browser are.

All this is to say that this section seems, at the outset, more oriented towards web browsers as audience.

2.1 Terminology

If you're following along with these notes, it may be useful to read this section in case any specific ~~ vocab terms ~~ are referenced again.

2.1.1 Resources

A feature cannot be considered "supported" if the browser can handle some but not all of its "critical aspects". The example they use is a browser that knows the dimensions of an MPEG-4 file but doesn't support the compression format. I mean, makes sense.

2.1.3 DOM trees

"The root element of a Document object is that Document's first element child, if any. If it does not have one then the Document has no root element."

2.1.5 Plugins

Plugins aren't "child browsing contexts of the Document" and don't add any Node objects to the Document; the example they use is a PDF viewer inside the browser.

The spec actually doesn't prescribe how user agents (browsers) support and implement plugins, or whether they need to do so at all.

Day 1 of #100DaysOfSpec: Introduction

Illustration of two chat bubbles saying hello

I am reading and taking notes on the HTML specifications for 100 days as part of #The100DayProject. Read the initial intent/backstory. I am a Microsoft employee but all opinions, comments, etc on this site are my own. I do not speak on behalf of my employer, and thus no comments should be taken as representative of Microsoft's official opinion of the spec. Subsections not listed below were read without comment.

Everything in today's reading was not part the HTML standard, just background info.

1.1 Background

I figured most of this first chapter would be fluff I wouldn't need to cover, but waddaya know, something new in the first section: HTML was originally used for "semantically describing scientific documents".

Probably this is something they say when someone sits you down to have "the HTML talk"? I don't know, never had it.

1.4 History

This subsection is not that long but would be worth a complete read. Probably most interesting to me was the gap in HTML's evolution (as HTML) between 1998–2003/2004. This was something I was vaguely aware of but I hadn't realized the gap was quite that long.

1.5.3 Design notes > Extensibility

It's interesting that the spec refers to classes as a way devs can "effectively [create] their own elements". I have simply thought of this attribute as a "hook" I can use to apply styles and interactivity, but I suppose, yea, you're basically creating a new Thing.

data-*="" attributes are "guaranteed to never be touched by browsers".

1.7.1 Structure of this specification > How to read this specification

"First, it should be read cover-to-cover, multiple times. Then, it should be read backwards at least once." Very cute, W3C.

Interestingly (though still logical), in the vocabulary of the spec, a "producer" is a developer of websites and a "consumer" is a web browser.

1.9 A quick introduction to HTML

This section contains a diagram that shows how markup gets split up into different nodes in the DOM tree, "an in-memory representation of a document". Curiously, spaces (except for those at the beginning/end of the HTML file) and line breaks yield "Text" nodes.

1.9.1 Writing secure applications with HTML

I was aware of cross-site scripting attacks via links and form submissions, but I hadn't thought of a hacker's ability to use img maliciously (using the onload attribute to run scripts is the example they cite).

Cross-site request forgery: the vocab term for when another site maliciously makes server requests, acting as a user.

1.10 Conformance requirements for authors

Explains why "presentational markup" (attributes like color, border) from previous versions of HTML were dropped: bad for accessibility, harder to maintain, larger doc sizes. Aww, the good ol' days of tabular layouts (JUST KIDDING, folks).

"It is also worth noting that some elements that were previously presentational have been redefined in this specification to be media-independent: b,i, hr, s, small, and u."

1.11 Suggested reading

A bunch of links for further reading.