Tuesday 13 January 2004 — This is 21 years old. Be careful.
So if I have Unicode strings in Python, and I print them, they get encoded using sys.getdefaultencoding(), and if that encoding can’t handle a character in my string, I get a UnicodeEncodeError. Can I set things up so that the encoding is done with ‘replace’ for errors rather than ‘strict’? As it is, I use a function instead of print:
# Safe printing: can print any unicode string
def safeprint(msg):
print msg.encode(sys.getdefaultencoding(), 'replace')
# blah blah
safeprint(mytrickystring)
Isn’t there a way to set stdout to not care or something?
Comments
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
It's a bit of a hack but it's very usefull for testing.
Why don't you use a terminal emulator with utf-8 encoding support?
import sys
import codecs
writer_class = codecs.getwriter(sys.getdefaultencoding())
sys.stdout = writer_class(sys.stdout, 'replace')
q = u'\u00bfHabla espa\u00f1ol?'
print q
By default, when encoding text, an encoder should not raise an exception. Instead, the encoder should try to continue encoding as best it can. ("Replace" is one way to continue.) Why? Because humans are much better at interpreting text than machines are. A machine doesn't know what to do with the error. A human may be able to look at text containing a few '?' characters and still make sense of it.
I think there is a general principle here. When it comes to working with text, its better to defer to human judgment than machine judgment.
Unless explicitly silenced.
http://groups.google.com/groups?selm=23891c90.0306060626.24e6646d%40posting.google.com
You could always write a function or make a class which does the right thing for you, but since there's no right thing for everyone, the "go with the flow, man" argument for some kind of magic convenience function or output mode really doesn't hold water. However, I'd like to see better locale support, but then one tends to come up against various platform breakage rather too often to make this a trivial piece of work.
Add a comment: