![]() | Ned Batchelder : Blog | Code | Text | Site Keep data out of your variable names » Home : Blog : December 2011 |
I saw this question this morning:
nouns = open('nouns.txt', 'r') Naturally, the answer here is to make a dictionary keyed by first letter: words = defaultdict(list) The question reminded me of others I've seen on Stack Overflow or in the #python IRC channel:
The thing all these have in common is trying to bridge the gap between two domains: the data in your program, and the names of data in your program. Any time this happens, it's a clear sign that you need to move up a level in your data modeling. Instead of 26 lists, you need one dictionary. Instead of N tables, you should have one table, with one more column in it. These situations all seem really obvious, but there are more subtle situations where this dynamic appears. I just wish I could think of an example! :)
tagged:
python» 11 reactions | |
Comments
locals() is your friend: 'var_name' in locals(), locals()['prefix_' + var_name].
Third question indicates too complicated design, unless you are writing another ORM.
@void: Ugh, no! The whole point is to use a dict instead of doing variable name hacks like this with locals(). This is the kind of literal answer that does no good for beginners to hear. You need to find out what they are really trying to accomplish, and give them the appropriate tools, not just answer their narrow mis-guided question.
Another way to do this:
nouns = open('nouns.txt', 'r')
for word in nouns:
exec '{0}.append(word)'.formar(word[0])
About once per decade you come across a legitimate reason for testing for the existence of a variable. E.g., the following JavaScript I wrote the other day:
// Usable as a Node.js module or directly if 'exports' isn't defined. var rfc3339; try { // Assign if being used in Node.js or other CommonJS environment. rfc3339 = exports; } catch(e) { // Not in a CommonJS environment, assume just in a browser or something. rfc3339 = {}; } .... rfc3339.parse = parse; rfc3339.format = format;@dudi: Again, this is making me sad. Why create code like this? The point isn't to find weird obscure ways to get the job done. Would you put code like this into your own project? That isn't engineering, it's stupid pet tricks.
Let's take a step back for a second and just discuss some basic variable handling and protection which we must handle ourselves in python. In many real-world cases our file won't be perfectly formatted.
The first thing to think about in this case is that there may be leading whitespace on the line... this means we must use "word.strip" in place of "word.rstrip". That's pretty straightforward, now we can handle " foo" as well as "bar\n".
Next we have something that may not be quite so obvious at first glance. What if we have two newlines in a row _somewhere_ in the file, or if we have a line composed of purely whitespace characters. In these cases the use of "word[0]" is actually problematic...
Providing we aren't doing extra steps to guard against this string state, this will result in "word.rstrip()" being of length 0 and thus indexing will throw an exception. In the first example the pythonic solution would be to use ".startswith" as in "word.startswith('a')" (as an added bonus, ".startswith" can check for strings with more than one character, as in "word.startswith("foo")", but that's tangential). This will return True in the case that the first character is 'a' and False for all other cases. This only handles the initial proposed code, not the cleaner version... so....
In the case of the dictionary approach, the use of indexing into a potentially 0 length string is still dangerous, but since we are using it directly as an index we can't use our new-found ".startswith" method. This sadly means that we have to fall back to a guard or do some gymnastics (and there are gymnastics we can do... but aren't we trying to avoid those?). Either we surround the block with a try/except or we add a conditional checking for some minimum length.
@Alex, wow, we weren't talking about those issues! I have no idea what the original person's data file looked like, so I don't know whether this much care is needed.
Reminds me of situations where inheritance is used instead of aggregation and/or a strategy object. It leads to class hierarchies with leaf names like DirectDepositLifePolicy, DirectDepositGeneralPolicy, MonthToMonthLifePolicy, MonthToMonthGeneralPolicy etc, coupled with nested-if statements to create the correct type of object :(
Without knowing anything at all about this, really, here's my question: when you rely on a dictionary, aren't you relying on the quality of the dictionary? What if new words appear, or people are inputting names? It seems to be to be more robust to just look at the first letter, even if that isn't so elegant. The example you gave would work for any word written in the Latin Alphabet, which is pretty robust. On the other hand, the dictionary approach weeds out nonsense and typos. It seems that the dictionary would get stumped a lot more often, which might be good or bad. Oh well, back to lurking.
@Paul, we have a terminology conflict here: In the real world, "dictionary" means a book containing words compiled by a lexicographer, and so it by definition a pre-determined set of words. In Python (and some other computing arenas) a dictionary is a data structure for storing a value under a key so that later if you have the key you can find the value. Any value can be stored with any key. In this case, the values are lists of words, and the keys are the first letter of the word.
The names are the same because both real-world dictionaries and Python's dictionaries are ways of finding something if you have the key. In the real-world dictionary, the key is the word, and the value is the definition. Ironically, though, they do this in very different ways. In the real-world dictionary, it is essential that the values be recorded in order, sorted by their key. In computing dictionaries, it is much faster if they are stored in an apparently random fashion, so that the dictionary has no sensible ordering at all!
I'm coming across this very late, but let me quickly note how incredibly true and important the post is (and how frustrating it is that so many comments miss the point!) Like you, I run into this kind of question regularly on Stack Overflow, and am glad to have somewhere to refer askers.
Add a comment: