Django special character tag

Thursday 1 January 2009

I like using unusual text characters to decorate my site, for example, my home page uses lots of mid-dots (· ·) and chevrons (» »), as well as other special characters. To keep the HTML source from being cluttered with those inscrutable numeric entities, I wrote this Django tag:

special_ch = {
    '':     '',
    '(c)':  '&#xa9;',    # COPYRIGHT SIGN
    'S':    '&#xa7;',    # SECTION SIGN
    '*':    '&#x2022;',  # BULLET
    '.':    '&#xb7;',    # MIDDLE DOT
    '-':    '&#x2013;',  # EN DASH
    '--':   '&#x2014;',  # EM DASH
    ':>':   '&#x25b6;',  # BLACK RIGHT-POINTING TRIANGLE
    'o':    '&#x25e6;',  # WHITE BULLET
    '[]':   '&#x25ab;',  # WHITE SMALL SQUARE
    '<>':   '&#x25c7;',  # WHITE DIAMOND

def ch(value):
    return '&#xa0;'.join([special_ch[s] for s in value.split(' ')])

Now I can use the ch tag with a mnemonic representation of the character in question. Spaces become non-breaking spaces to help control the layout around these characters:

<p>{% ch ">> " %}more text..</p>
<p>{% ch "(c) " %}2002{% ch "-" %}2009</p>


» more text..

© 2002–2009

The tag reference takes more space than the entities, but I can tell how they will display, without having to memorize the Unicode code points.


Why not to use unicode encoding for templates and place those characters directly, without HTML ampersand entities?
Because © isn't mnemonic enough?© » for »?
[gravatar] is my list of ornamentals. seems to be a comprehensive list of named entities.

I thought we had a conversation about this a few years ago, but that was Keith Devens, not you. :-p
@vvd: I'm not accustomed to entering non-ASCII characters directly into source files. That's probably the best way to go in the long run..

@Bryan: I've got a bit of Stockholm syndrome from working with XML files that makes me think I have to use numeric entities. Named entities are a good option for HTML files (and templates) though.
A big moment for me was when I realized that I could remap CapsLock, which I had not used for at least two decades, to the "Compose" key used to introduce multi-key Unicode character abbreviations under Linux. Now I just run:
less $(locate en_US.UTF-8/Compose)
every so often when I want to read back through the (many!) possible key combinations and see what characters I can type. Most of them were easy to guess without looking them up. An en-dash is "--." while an em-dash is "---" while the Copyright symbol © is "co" or "oc" (most of the codes work when typed either way). The â character is either "a^" or "^a", and so forth. I type them; they appear in this text box; and now I'll hit "Post" and they'll appear. It's easy. It's magic.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.