Software development Sunday, February 1, 2015

Solving HTML

I'm not happy with the progress of software development today. Like with other products of human thought, software systems and methodologies over time tend to become bloated and difficult to understand, instead of focusing on their essence.

Comparing with another area of thought, for a country to have a great constitution, it requires a re-boot from scratch, as with the US constitution in 1789. But it seems inevitable that this system of laws and rights swells up beyond recognition. People are ultimately motivated by self interest. Let enough time pass and the legal volumes will grow to embarrassing levels.

The same is unfortunately true for software systems. For example the programming language C# was created from scratch by a talented team at Microsoft headed by one of my heroes Anders Hejlsberg. C# is without a doubt my favorite language, and one of the best things to happen to software development in decades.

I am still using version 2.0 of the language, which is from 2005, while the most current is version 5.0, and this illustrates my point. From my perspective, nothing important happened to the language in ten years, it just got bloated with more functionality that makes it unnecessary complex and takes away from its elegant simplicity. It tried to be everything for everyone, just like the tome of US law was amended to cover everyone's special interest.

The same bloat has happened to HTML. I think it is safe to say that we're stuck with HTML when it comes to developing pages for the web, because there is too much investment in it already. In its essence HTML is quite a powerful idea of markup tags that are very easy to read, and an open system where everyone has access to look at the document's underlying structure. But now HTML has a become a beast that is very unpleasant to tangle with. Unfortunately, from its very first version to today's HTML5, the language maps very poorly to graphical layout. No matter all the fancy details that have been added to HTML, it's still a system you have to work around, not work with, to get your system implemented.

I have been watching this mess for 20 years now, hoping that someone would solve it. I can see some good attempts, but no real solution. If HTML was better, it would be the only way to develop systems, on all platforms and for all purposes, so it has a great potential. But it will never reach that potential if it not radically revised. I'm not sure if this is a political problem among the people who control the standardization, or if it just is a lack of good ideas. Instead of continuing to bitch about it, I decided to try and improve HTML myself.

The way I think about these issues probably sets me apart from most other people, both as a developer and a citizen. I believe that each system we use should be optimized for one specific difficult task, and do this task really well, and not please every user with tailored interfaces. The system can not predict what each user needs. The key point is: the user knows better than anyone else in what manner he wants to interact with the system. This is true of how developers consume APIs in software, and how citizens interact with government. So the first step is to cut all the dead weight that these interfaces constitute.

Cutting dead weight

A browser's layout engine can be seen as a black box that takes input of HTML and produces output as an interactive visualization. To be useful for a developer, this black box must be very predictable in what it produces. This can only happen if the input specification (HTML) is easy to understand and maps well to how the developer thinks about the page layout.

Compare this to the data query language SQL. A RDBMS is a very complex system that is optimized for speed over huge data sets, but has an interface (SQL) that is easy to understand and which maps well to how we think about the data.

Today's web browsers are built from more than 10 million lines of code. This black box is horrendously complex. HTML development of today is too much guess-work and tweaking until the page kind of works. When working with HTML, how many times have you asked yourself "Is this a feature or a bug?" This is a clear sign of poor design.

Let's cut away all the dead weight from this browser and let it focus on the one thing that is hard to implement and which it is good at: arranging blocks of layout elements onto the screen. Remove all the hairy specifics such as borders, margins, paddings, floats, layers, etc, and keep the inner core only. Then design a new language, a forwards compatible version of HTML, that targets this native rendering engine directly, and let the developer himself implement the specifics. If he wants a browser where elements can have borders, let him implement it. If he wants a browser that is fully HTML5 compatible for some strange reason, he is free to waste his time on that too.

For now, lets call this language OUIML, Open User-Interface Markup Language. If designed well, this first language definition is the last version we will ever need, since all extensions are in specific implementations of plugins.

OUIML

You can view OUIML as an extremely simplified version of HTML. When the browser reads a OUIML document, it builds a tree structure of nodes in memory, where each node is a single word or tag (XML) from the document. To lay these nodes out on the screen is a matter of giving size and coordinates to each of them in a flow-like, document-like manner. We are talking about the capabilities of the Mosaic browser of 1993, and with a few additions, this is the only thing we need.

HTML is really about two things: giving screen position to nodes, and drawing them with a certain style. We add to OUIML the ability to specify node relationships (a node's position in relation to other nodes), and the ability to run custom code that draws the graphics of each node on screen, and we are done. The browser will now be slimmed down to a black box that read node relationship formulas, positions the nodes accordingly, and call custom code to draw the graphics of each node.

The key points of OUIML:

Template driven tag definitions. No tags are predefined. All tags are defined by the developer using templates and code.
Predictable and understandable layout engine. Relationships between elements on the screen (nodes in the document) are governed by math expressions and node types. The way the browser lays these elements out on the screen is easy to understand.
Any programming language can be used. Any language (C#, Java, JavaScript, C , etc) can be used to manipulate the Document Object Model through the use of language plugins.
Supports existing standards. A OUIML document is well formed XML. Code libraries that implement specific standards can be used. For example, to support HTML5, include that specific plugin.
Universal user interface system. OUIML lends itself well to both dynamically formatted documents and statically positioned application interfaces. It can be used to read web site articles in a browser, as the form-style interface of an installed application, or as a combination of both that is so common on the web today.

Document format

No tags are predefined in OUIML. Through the use of a few predefined tag attributes, a template mechanism, and some custom code, you define your own palette of components that can be used to build the interfaces.

<ouiml>
  <body>
    <b>Hello world!</b>
    <footer />
  </body>
  <footer>----------</footer>
  <b fontWeight="bold">$$$</b>
</ouiml>

The <ouiml> tag defines the OUIML document. The tags at the root of the document indicate what templates are defined here, in this case <body >, <footer> and <b>. The tags and text inside each template are their definition.

To display a OUIML document, you load a number of files, each with one or more template definitions, and select to instantiate one of the templates as the one to display. The browser will then replace all template markers with the template definitions. For example, if the body template is selected to be displayed, the <footer /> and <b> template markers with be replaced, and following will be drawn to the screen:

  <body>
    <b fontWeight="bold">Hello world</b>
    <footer>----------</footer>
  </body>

Nodes

When the OUIML document is instantiated from the templates, a Document Object Model is built in memory, consisting of nodes. Each XML tag results in one node and each word also becomes a node.

All nodes have a set of built-in attributes. New attributes can be defined and existing attributes can be overridden. The programmer gives a tag any name that makes sense to make document more readable, such as <title>, <header> and <footer>. Since no tags are predefined, they are all seen as equal by the layout engine.

By default all nodes have top, left, width and height values to define a rectangular area on screen, and they act the same way a block element does in HTML. However through the layout attribute they can be changed to act like an inline element as well.

<panel id="myPanel" top="0" left="0" width="100" height="100">Hello</panel>

The panel node in this example is positioned at the top left corner of the document area, and given a width and height of 100 pixels.

The layout engine positions all nodes according to what attributes are defined on them. Some have a fixed position while others are positioned in relation to other nodes, through the use of math expressions. In these expressions, nodes are identified through their id attribute.

<panel id="mainPanel" left="myPanel.left - myPanel.width / 2">

In this case the mainPanel node will be positioned with its left edge at the center of the myPanel node.

The nodes that do not have their position and size explicitly set by the developer are set by the layout engine by using rules for flow-layout. This is the way words flow on a page, from left to right and to the next row when a row becomes full, just like in HTML.

In the process of instantiating a template, the browser builds a list of dependencies between the nodes of the document. It then figures out in which order to calculate each node position, so it all can be done in one pass. A layout that has circular references, or is unsolvable, will be detected and rejected at this point. As the browser evaluates the math expressions and assigns positions to flow-layout items, each node is put in place.

After calculations are done, the browser draws each node to the screen, in order of how they were defined in the document. This can mean that some nodes are drawn on top of other nodes, similar to layers in HTML. This drawing can be done by the built-in system, or by plugins. These plugins can implement CSS or any other system for defining borders, colors and style.

Proof of concept

As proof of concept, I have built a browser for OUIML in C#, and it looks very promising. Especially the node expression and dependency algorithms were hard to solve. As soon as I feel I have sorted out all bugs, I will post it here to try out.

« ‡ »