Saturday 31 December 2011 — This is exactly 13 years old. Be careful.
I saw this question this morning:
I’m adding words to lists depending on what character they begin with. This seems a silly way to do it, though it works:
nouns = open('nouns.txt', 'r')
for word in nouns:
word = word.rstrip()
if word[0] == 'a':
a.append(word)
elif word[0] == 'b':
b.append(word)
elif word[0] == 'c':
c.append(word)
# etc...
Naturally, the answer here is to make a dictionary keyed by first letter:
words = defaultdict(list)
for word in nouns:
words[word[0]].append(word)
The question reminded me of others I’ve seen on Stack Overflow or in the #python IRC channel:
- How do I see if a variable exists?
- How do I use a variable as the name of another variable?
- How do I use a variable as part of a SQL table name?
The thing all these have in common is trying to bridge the gap between two domains: the data in your program, and the names of data in your program. Any time this happens, it’s a clear sign that you need to move up a level in your data modeling. Instead of 26 lists, you need one dictionary. Instead of N tables, you should have one table, with one more column in it.
These situations all seem really obvious, but there are more subtle situations where this dynamic appears. I just wish I could think of an example! :)
Comments
Third question indicates too complicated design, unless you are writing another ORM.
nouns = open('nouns.txt', 'r')
for word in nouns:
exec '{0}.append(word)'.formar(word[0])
The first thing to think about in this case is that there may be leading whitespace on the line... this means we must use "word.strip" in place of "word.rstrip". That's pretty straightforward, now we can handle " foo" as well as "bar\n".
Next we have something that may not be quite so obvious at first glance. What if we have two newlines in a row _somewhere_ in the file, or if we have a line composed of purely whitespace characters. In these cases the use of "word[0]" is actually problematic...
Providing we aren't doing extra steps to guard against this string state, this will result in "word.rstrip()" being of length 0 and thus indexing will throw an exception. In the first example the pythonic solution would be to use ".startswith" as in "word.startswith('a')" (as an added bonus, ".startswith" can check for strings with more than one character, as in "word.startswith("foo")", but that's tangential). This will return True in the case that the first character is 'a' and False for all other cases. This only handles the initial proposed code, not the cleaner version... so....
In the case of the dictionary approach, the use of indexing into a potentially 0 length string is still dangerous, but since we are using it directly as an index we can't use our new-found ".startswith" method. This sadly means that we have to fall back to a guard or do some gymnastics (and there are gymnastics we can do... but aren't we trying to avoid those?). Either we surround the block with a try/except or we add a conditional checking for some minimum length.
The names are the same because both real-world dictionaries and Python's dictionaries are ways of finding something if you have the key. In the real-world dictionary, the key is the word, and the value is the definition. Ironically, though, they do this in very different ways. In the real-world dictionary, it is essential that the values be recorded in order, sorted by their key. In computing dictionaries, it is much faster if they are stored in an apparently random fashion, so that the dictionary has no sensible ordering at all!
nouns = (word.strip() for word in open('nouns.txt', 'r') if word.strip())
words = defaultdict(list)
for word in nouns:
words[word[0]].append(word)
I read this article and thought
"well yeah, thats obvious"
But after reading the comments, I would like to cry. You basically said
"water is wet"
and people argued with you.
Basically, I wrote a much more long-winded post about the same thing last year, someone pointed out that you'd written a much simpler version of the same thing, so I added a link to yours from the top of mine. (You can see it at http://stupidpythonideas.blogspot.com/2013/05/why-you-dont-want-to-dynamically-create.html).
I'm just completing my masters in structural engineering and one of my homework's had some very long and nasty equations that I had to program to run a monte carlo simulation. (yes, statistics; now you know what I mean be 'long and nasty equations')
At first looked something like
p=1, fy=2,alpha=3,...
variable[P]*variable[fy]^variable[alpha]+variable[...]...
and I realized this was stupid. I used exec(P=value), etc. and ended up with code more like
P*fy*(1-alpha*b*c/sigma....
This was very useful because I had to code in a number of such equations with varying levels of complexity, and with this method I could literally copy-paste my original equations directly into my code in an immediately understandable format.
Ok, I went through and found my code that used this:
if a < .414*l
g_Defl = l / 240 - P * a * (l^2 - a^2)^3 / (3 * E * I * (3 * l^2 - a^2)^2);
else
g_Defl = l / 240 - P * a * (l - a)^2 / (6 * E * I) * sqrt(a / (2 * l + a));
end
if g_Defl < 0
counterDefl = counterDefl + 1;
end
%check moment
Mload = P * (l - a)^2 * (a + 2 * l) / 2 / l^3 * a;
Mfixed = P * a * (l - a) * (a + l) / 2 / l^2;
g_M = Fy * Z - max(Mload, Mfixed);
Do you really perceive programming as engineering?
Thanks for writing on this topic.
I came across a problem on the Jet Brains academy that supplied variables like this in the template of the problem:
bloomberg_com = "something"
nytimes_com = "something else"
The problem then asked the student to print the values of those variables when the user inputs "bloomberg.com" or "nytimes.com".
You had a lot of students using eval() and values().keys() in order to access those variables based on the input, rather than using conditionals as other students did. Changing the problem template was obviously discouraged, even though it was not explicitly prevented, since no one changed the structure to just use a dictionary as you would probably suggest.
So it seems that educators are sometimes inviting the sorts of hacks you are denouncing here by supplying study problems that are either too inflexible or which lack proper direction. Poor habits forming early?
Add a comment: