Random HTML factoid: no '

Thursday 29 March 2007

A brief detour through some docs led me to PHP’s htmlspecialchars function, where I noticed that double-quote becomes ", but apostrophe becomes '. Seemed odd, since we’re all so used to ' as the apostrophe entity. A comment on the docs claimed that there’s no such thing as ' in HTML. I was already three or four levels deep on the distraction stack, so I went and looked.

Sure enough, the HTML 4.0 spec defines 255 different character entities, and ' is not among them.

What does it mean? Nothing, really, since the browsers all understand the entity, but it demonstrates that sticking to a standard may be tougher than you think, since common practice so often exceeds what the standard guarantees.

» 9 reactions

Comments

[gravatar]
Loz 7:54 AM on 29 Mar 2007

It is valid in xhtml though, just to confuse matters.

[gravatar]
Fredrik 8:09 AM on 29 Mar 2007

That's because ' is part of XML. It's not something they added especially for XHTML.

[gravatar]
Charles Miller 8:46 AM on 29 Mar 2007

The XHTML standard recommends you use ' instead: http://www.w3.org/TR/xhtml1/#C_16

[gravatar]
Charles Miller 9:09 AM on 29 Mar 2007

(To follow up my own comment)

I'm not sure if this is still the case, but for a long time, Internet Explorer didn't recognise ' as an entity in HTML, but Mozilla would. So if you used an XML escape function to escape your HTML, it would render fine in Firefox, but be full of ''s in IE.

The idea of Internet Explorer being inconvenient because it was following the standard always struck me as amusing.

[gravatar]
Simon Willison 9:51 AM on 29 Mar 2007

Django's escape filter (and the django.utils.html.escape function that backs it) get this right, thankfully - it was news to me as well.

[gravatar]
Ken Hirsch 11:12 AM on 29 Mar 2007

IE 6 won't render ' Can't check IE7 'cuz they've asked us not to install it at work yet.

One thing I didn't know until last week was that in SGML, hence HTML, the semicolon at the end of the entity is optional (unless it's needed for tokenization). But in IE (up to v6, at least) this doesn't work for entities added with HTML 4.0. E.g. (look at this in I.E.),
These are okay:
&lt &eacute Φ
But this is not:
&Phi

[gravatar]
Alex 12:42 PM on 9 Oct 2008

The appropriate workaround is ' , by the way - the XHTML website mentions this.

http://www.w3.org/TR/xhtml1/#C_16

[gravatar]
elz 12:10 PM on 20 Dec 2008

I don't understand what ' means! I know that it is like an apostrophy mark but why does it have that there it is blinking annoying isn't it dont u agree it doesnt do it on some things though x

[gravatar]
Colin 8:26 AM on 15 Jul 2010

Yes, I have just put a mod into my html code and thought I would use the 'standard' abbreviation for an apostrophe (& apos), tested it on Firefox (no problem) then found IE 8 doesnt understand what this 'standard' abbreviation means, yet it does with & nbsp;! Bizarre ! It does understand the & #039; for apostrophe though. Its good to keep up with the latest standard abbreviations in the browsers ! & #039; is the standard for apostrophe - thats easy to remember isnt it! ;-)

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.