Bad web typography: full justify

Sunday 15 June 2008This is 15 years old. Be careful.

Full justification on the web is usually a bad idea.

Typographers use full justification to get an elegant-looking block of type. The straight right edge is a strong visual element on the page, and can add to the controlled overall look. But typographers care about more than just the outline of the rectangle. They care about the evenness of the type within the rectangle, something they call “color”. The goal is to get an evenly filled area, with no large changes in density.

Because full justification involves stretching word spaces, if a line has to be stretched too much, the spaces become wide enough to be noticeable white blobs on the page. The line of text is then “too loose”, and interrupts the flow of reading.

In traditional typography, hyphenation is used to reduce the need to make loose lines. By breaking words into smaller chunks, the lines can be filled more naturally, and they don’t have to be stretched too far.

But web browsers don’t hyphenate. As a result, paragraphs often suffer. Here are some examples from the OpenID news for February 2008:

A full-justified paragraph

(I’ve blurred it a bit to emphasize the color.) This paragraph is OK, with just two problem lines: the fourth (“some of the top ...”) and fourth from the bottom (“to support the community ...”). These lines are loose enough that I stumble when reading them, as if they were typed “some .. of .. the .. top ..”

But then we come to the other problem with full justification on the web:

A full-justified paragraph with a URL in it

Occasionally URLs appear in paragraphs, and these are very large “words” that completely screw up the line before them. Technical writing is especially prone to this as other non-word content appears in running text, such as function names.

I think full justification is one of those technology hold-overs: the new technology trying to mimic the old. Books and newspapers use full justification, so we try to do it on the web also. But content on the web rarely appears in a constrained rectangle. Full justification in print is appealing partly because the justified right edge of the text is a good echo of the right edge of the paper, or of the left edge of the next column in a newspaper. In a single column of text in the middle of a browser window, full justification isn’t gaining you much, and brings you pain in the form of loose lines.

Except in specialized cases, or where you know very clearly what type of content will appear, you shouldn’t use full justification on the web. The lack of hyphenation is a killer.

As it happens, there are browser-side hyphenation solutions, but they also have their drawbacks: code size and execution time.

BTW: it isn’t just the web that suffers from hyphen-less justification. Amazon’s Kindle has the same problem, something I noticed right away when I first tried one out. I’m not sure why they wouldn’t have built hyphenation into a reading device. And I’m reading a Salman Rushdie book published by Penguin which uses no hyphenation. Why would a traditionally-published book forgo the tried and true technology of good-looking pages?


I'm not sure I agree that full justification isn't gaining much in the web world. What if the webbrowser knew about correct hyphenation rules and could do TeX-quality full justification in real-time. I think that would be very interesting.
With the release of Firefox 3 on tuesday all major current browsers will be supporting soft hyphens (­). While browsers still have a far way to go to catch up with the quality typography we are accustomed from printed media (and the problems here are not just hyphenation) projects such as the hypgenator the following do provide a little hope:
It's true that browsers will advance, and soft hyphens are a great way to solve the problem: they let the server side insert hyphens, rather than do the work at the browser. Unfortunately, when hyphenation is separated from line-wrapping (as it would be in hyphenator.js or in server-side soft hyphens), you have to hyphenate every word, rather than just the ones that appear at the end of overly tight lines.

But all of the these techniques have to be balanced against the current browser census. Firefox 3 will take some time to adopt, for example.
Three immediate points should be made.

First, full justification is not merely a matter of mimicry; it's a matter of usability. Ragged edges impose cognitive load, because the eye has to re-discover the edge of every single line. With fully justified text, by contrast, the eye can simply read to the edge of the big, visually-obvious rectangle it's scanning and not have to peer blearily in between two adjacent lines of text for exactly where the current one ends.

Second, the idea that a long word like a URL would cause only the previous line to gain wide spaces is a signal that a terribly, terribly primitive paragraph-breaking algorithm is being used, and that — in my opinion — someone is being allowed to write presentation software who doesn't know the field. Very fast algorithms have existed since the 1970s that do not merely find good ways to break a paragraph into lines, but will be guaranteed to find the best way to break the entire paragraph — so that the choice of where to break the very first line can "feel" the needs of the last lines to accommodate a long URL. The entire paragraph should be spaced a bit wider to help the last line; only a very poor algorithm punishes the line right before it.

Third, soft hyphens should not be a stopgap measure, but are the optimum final solution, because hyphenation is a finicky enough beast that it cannot finally be simply left in the hands of the browser to do where it thinks it will work. Variations of language, proper names that look like normal words, and other issues make it impossible that browser-based hyphenation could ever work very well. Hyphenation should be left entirely in the hands of content producers, whose software should, under their watchful eye, produce documents with soft hyphens at every possible and appropriate point of hyphenation. It's something that needs to be in the content, not guessed at presentation time.
Brandon, you are right that full justification can help with the readability of the text. But it has to be balanced against the loss of usability due to the poor line breaks. We can't advocate full justification on the web simply because it's most readable in books: we have to measure the actual usability we get in web justification. My claim is that we're better off with ragged right.

You are also right that better line wrapping exists, but not in the browser. Again, I'm talking here about how to best use the technology we currently have. And I'm not sure even Knuth's algorithm could deal well with the affront of a 15-em URL in the middle of a paragraph!
I hadn't heard of the "­" entity before and, now that I have, I confess to being a little dismayed. I understand the problem, but forcing servers to sprinkle these throughout an entire document seems wrong on a number of levels. Hyphenation is a display-time decision that should be the responsibility of the client - the client has the most information about the abilities and limitations of the display device, and the context in which the hyphenation is going to be done. Hyphenation is a characteristic of how the content is displayed (i.e. stylistic) and not an inate property of the content. To have that information embedded in the content would seem to be a step backward from principal of separation of content and style that the web has been working toward in recent years.

Which is not to say the server shouldn't dictate what words can/cannot be hyphenated. As Brandon points out, there are nuances that clients may not be able to account for with a general, all-purpose algorithm. (That said, I suspect a reasonable hyphenation algorithm + dictionary would work just fine in all but the most esoteric cases).

What is needed here is a more stylistic solution to the problem, something akin to how CSS works. Instead of embedding soft-hyphens in the content, the server should provide hyphenation rules and allow the client to determine how those rules apply to the content.

For example, in reading about how OpenOffice does hyphenation, it appears that there is a fairly standard algorithm by Franklin Liang (1982) that forms the basis of hyphenation in most free software. This, combined with the per-language hyphenation dictionaries that OpenOffice uses, would seem to be a good basis for hyphenation logic in clients, and would address 99% of the cases where hyphenation is necessary. (Note: the per-language hyphenation dictionaries are surprisingly small. ~60KB per language, fwiw.)

In the cases where websites wanted to further refine how hyphenation took place, they could provide one or more custom dictionaries in the form of <LINK> tags, like this:

<link rel="hyphenation" type="text/plain" href="neds-hyphenation-rules.txt">

I won't try to define the format of these hyphenation dictionaries (although the OpenOffice ".hlab" format seems an obvious choice). The main point is that this has several benefits:
1. Separates hyphenation logic from content (w00t!)
2. Even in the worst case scenario - where the client has no default algorithm/dictionaries, this approach is more efficient. The server need only provide one rule per word that appears in a document, instead of one <shy> entity per occurance of each word.
3. More efficient still... the clients can/will cache the dictionary files
4. And more efficient still - servers can rely on clients to perform the (potentially CPU intensive) hyphenation logic.
(*dooh* forgot to check "Email me future comments". Ned, remember that setting for me would ya? ;) )
Full justification doesn't help readability at all, on the contrary, ragged right edges help the eyes remember where we are in the text. Also we don't read letters but words and even groups of words. We recognize words not by reading each individual letter, but with the particular sequence of ascenders and descenders. The differences in the configuration of words and line lengths is what helps us get to the meaning. Uniformity is counter-productive.
Bri Braithwaite 11:28 PM on 1 Feb 2012
Maybe I spend too much time reading legal briefs, all of which are text justified. I like it. I also prefer websites that do it. But then maybe I'm weird?
It's a bit of a shame. While I agree that the wide blocks of white space is not optimal, I've generally found that text looks cleaner when the right and left edges are vertically aligned. If it wasn't for this being the normal practice, I'd see a jagged right edge of a block of text as looking very sloppy. Unfortunately "justified" looks best when the text region is wide and we've actually been going backwards with that due to the popularity of low resolution and vertically-aligned cellphone browsers.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.