Django special character tag

Thursday 1 January 2009

I like using unusual text characters to decorate my site, for example, my home page uses lots of mid-dots (· ·) and chevrons (» »), as well as other special characters. To keep the HTML source from being cluttered with those inscrutable numeric entities, I wrote this Django tag:

special_ch = {
    '':     '',
    '>>':   '»',    # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
    '<<':   '&#xab;',    # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
    '(c)':  '&#xa9;',    # COPYRIGHT SIGN
    'S':    '&#xa7;',    # SECTION SIGN
    '*':    '&#x2022;',  # BULLET
    '.':    '&#xb7;',    # MIDDLE DOT
    '-':    '&#x2013;',  # EN DASH
    '--':   '&#x2014;',  # EM DASH
    ':>':   '&#x25b6;',  # BLACK RIGHT-POINTING TRIANGLE
    'o':    '&#x25e6;',  # WHITE BULLET
    '[]':   '&#x25ab;',  # WHITE SMALL SQUARE
    '<>':   '&#x25c7;',  # WHITE DIAMOND
    }

@register.simple_tag
def ch(value):
    return '&#xa0;'.join([special_ch[s] for s in value.split(' ')])

Now I can use the ch tag with a mnemonic representation of the character in question. Spaces become non-breaking spaces to help control the layout around these characters:

<p>{% ch ">> " %}more text..</p>
<p>{% ch "(c) " %}2002{% ch "-" %}2009</p>

becomes

» more text..

© 2002–2009

The tag reference takes more space than the entities, but I can tell how they will display, without having to memorize the Unicode code points.

Comments

[gravatar]
vvd 12:08 PM on 1 Jan 2009

Why not to use unicode encoding for templates and place those characters directly, without HTML ampersand entities?

[gravatar]
Bryan Price 1:35 PM on 1 Jan 2009

Because &copy; isn't mnemonic enough?© &raquo; for »?

[gravatar]
Bryan Price 1:43 PM on 1 Jan 2009

http://www.bryanlprice.com/specials.html is my list of ornamentals.

http://www.chami.com/tips/internet/050798I.html seems to be a comprehensive list of named entities.

I thought we had a conversation about this a few years ago, but that was Keith Devens, not you. :-p

[gravatar]
Ned Batchelder 2:03 PM on 1 Jan 2009

@vvd: I'm not accustomed to entering non-ASCII characters directly into source files. That's probably the best way to go in the long run..

@Bryan: I've got a bit of Stockholm syndrome from working with XML files that makes me think I have to use numeric entities. Named entities are a good option for HTML files (and templates) though.

[gravatar]
Brandon Rhodes 8:15 PM on 1 Jan 2009

A big moment for me was when I realized that I could remap CapsLock, which I had not used for at least two decades, to the "Compose" key used to introduce multi-key Unicode character abbreviations under Linux. Now I just run:

less $(locate en_US.UTF-8/Compose)
every so often when I want to read back through the (many!) possible key combinations and see what characters I can type. Most of them were easy to guess without looking them up. An en-dash is "--." while an em-dash is "---" while the Copyright symbol © is "co" or "oc" (most of the codes work when typed either way). The â character is either "a^" or "^a", and so forth. I type them; they appear in this text box; and now I'll hit "Post" and they'll appear. It's easy. It's magic.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.