|Ned Batchelder : Blog | Code | Text | Site|
» Home : Blog
Damien Katz is working on a big project called Couch (a non-relational database, plus a bunch of other stuff). He's given a lot of thought to how to make the software reliable, in particular, how to handle unforeseen conditions. He's just posted a long explanation of his thoughts on the matter: Error codes or Exceptions? Why is Reliable Software so Hard? It's very good, includes witty illustrations, and even (in the comments) an admission that the whole Hasselhoff thing is a gag.
Damien is implementing Couch in Erlang, which is one of those esoteric languages I wish I had the time to really understand. From what Damien has told me about it, it is truly mind-bending, in good ways. His article points in the Erlang direction, to give you a taste of how a different programming paradigm can change how you think about software.
James Tauber mentions the distinction between lists and tuples in Python. I have to admit: I never made this distinction properly before. I didn't even think of tuples as immutable lists. They were just the lists with parens, which made them less attractive, because it's ugly to make one with a single element (that wierd orphaned comma), and the parens don't stand out from the rest of the code as well as square brackets do.
Of course, if you want to modify elements, you have to use a list, and if you want to provide a "list" of element to the string formatting operator, you have to use a tuple. Other than that, I never really thought about it.
This isn't me, and this isn't my son, but this is so me and my son:
Yesterday's post about Code Monkey was fruitful in that my lament about the lack of shows about geeks led me to The IT Crowd, a British TV comedy about an IT department, and their newly-assigned computer-illiterate manager. The show is available from channel4.com, but only in the UK. Of course, Bob dug up the YouTube links: Episode 1, Episode 2, Episode 3, Episode 4, Episode 5 and Episode 6.
I watched the first episode, and found it very funny. There will of course be comparisons to The Office because of the setting (an office!), but the show is much more like Fawlty Towers in its style and its humor. It's not exactly about software engineers, but the set is filled with geek references (EFF stickers, a Commodore PET in the background, RTFM t-shirts, and so on). Good stuff.
With all the TV shows about policemen and doctors and lawyers, why has no one made at least one TV show about software engineers? Our lives are full of drama and comedy, we should have a show. Until the J.J. Abrams of the world figure out the gold mine that is a cubicle farm, we'll have to make do with rock music.
Jonathan Coulton has penned Code Monkey, a rock anthem to socially-challenged cubicle dwellers everywhere:
And then there's the romance...
Here are a few things I learned yesterday while making a Python program go faster:
1: When sorting a list, you can provide a comparison function, often in the form of a lambda:
mylist.sort(lambda x,y: -cmp(x.computeValue(), y.computeValue()))
This is cool because you can control how values are sorted. But it's bad because the function is invoked for every comparison of pairs in the list. If computeValue is truly intensive (for example, if it queries your database), there's a lot of work going on. Also, why'd I have to repeat "computeValue()" in the lambda?
Turns out I didn't. Since Python 2.4, sort also has a key= argument, which is a function of one element which returns the key to use for sorting the element:
mylist.sort(key=lambda x: x.computeValue(), reverse=True)
The key function is called once per element, and the values returned stored to perform the comparisons. The reverse=True argument is also new in 2.4, to force the sort in the other direction, instead of the negative cmp trick shown above, or worse, the x,y then y,x trick I've sometimes seen in comparison functions.
I made this one-line change and saved myself 1000 database queries!
2: If you are measuring the time taken by a chunk of Python code, it matters what platform you are running on. On Windows, time.clock() is the wall time, but on Unix, it is the processor time. As the timeit module shows, time.time() is the best option for wall time on Unix. It makes a huge difference to measure wall time rather than processor time.
We spent a while yesterday trying to find a missing half-second. It turns out it was all the time that our process wasn't executing (for example, waiting for the database)!
3: If you have a number in fraction of seconds, and you want to display it in milliseconds, you multiply by 1000, but don't do it like this:
print "Elapsed time: %d" % secs*1000
Because that will print:
and so on, 1000 times. The % operator has higher precedence than *, so what you really want is:
print "Elapsed time: %d" % (secs*1000)
This past December, I was given a book of Sudoku puzzles, and it consumed a significant chunk of my spare time. The puzzles are addictive, partly because at a certain difficulty level, they are easy to start, and then hard to finish. So once I finally completed a puzzle, my eye would move on to the next, and I'd see a number to scribble in, and then I had to finish that puzzle, and the cycle would continue.
Andrew Stuart has an extensive and detailed list of Sudoku strategies: Advanced Sudoku Strategies. Some of these are so complicated, I can't quite imagine being able to put them into practice in the wild, but it is fascinating to see how elaborate the solving technique can become.
BTW: I can't understand why everyone uses the classic marking technique of writing teeny numerals indicating what a cell could be. Much easier in a number of ways is the dot technique (illustrated in the Sudoku entry at Wikipedia) whereby a dot in a cell indicates what number it cannot be. The placement of the dot indicates its numeral. The dots are easier to enter than tiny numbers, and as more constraints are discovered (that is, as you progress toward the solution), you write more dots, rather than erase numbers. If no mistakes are made, and you don't mind writing the answer numeral over existing dots, you never need to erase. Also, sometimes single-number constraints are discovered quite early in the solving process. A dot can be made to indicate this, while you'd never write seven or eight tiny numerals to remember which numbers are left as possibilities. I don't get why the nine-dot marking technique isn't more popular.
I'm not sure how the role of Quality Assurance got its name. It's not right.
Many people seem to believe that QA's job is to ensure that a product has high quality. This is not their job at all, and it is easy to see if you think about it. QA doesn't change the product at all, there is no way anything they do can directly improve the quality of the product. The only way to improve the quality of a piece of software is to change lines of code to remove bugs and add features. QA doesn't do that.
So what do they do? QA's job is simple: they figure out what the software actually does. Developers produce software, and they believe it will do X, Y, and Z. They've set out to write it so it does these things. They worked hard to make it do these things. They believe it will do these things.
But does it?
That's QA's job: to figure out what the actual software actually does. Forget the designer's ideal, forget the fevered dreams of the developers. What does the software before us actually do? QA tests the software to figure that out. No matter what sub-discipline of testing we're talking about, the purpose is the same: to figure out what the software does.
Of course, the whole point of figuring out what the software does is so that you can compare it to the intention, measure the difference between theory and practice, and work to address the gap. That's the part where quality improves: the measuring of the gap, and the work to close it up.
Don't get me wrong: I love QA. I think it is an important role. But you're doing it wrong if you think you are improving quality. It should be named Reality Check, or just Testing, or something like that.
Pimp My Snack is a gallery of home-made giant-sized snacks, mostly candy. giant Snickers, giant Cadbury Creme Egg, giant M&M. The running commentary is pretty funny, as the creator describes his sometimes-failed ad-hoc attempts to recreate candy bars on a giant scale.
Mike Rundle has a handy overview of some design principles: How C.R.A.P. is Your Site Design? His acronym stands for Contrast, Repetition, Alignment, and Proximity. Four good principles to keep in mind when designing pages.
My rule of thumb has always been: use as few things as possible (faces, sizes, colors, widths, and so on). This inadvertently lead me to Repetition and Alignment, but I think makes my pages a little plain.
Remarkably, I haven't linked to The Brick Testament before. It's quite a body of work, a significant chunk of the Bible, rendered quite well as Lego scenes. I think it's sweet that the sections are marked with letters indicating a rating: N for nudity, V for violence, and so on.
Russ Olsen sums up Five Truths about Code Optimization. He's spot on, especially about "you don't know where the problem is." I can't remember how many times I've gone in with a nifty idea about speeding things up, tried it, and it didn't work.
We started trying to seriously estimate how much work we could do in a week at work recently. We used an estimate of 55 work hours in the week. I try hard to keep normal office hours (home by 6:30), but I often work at home some in the evenings and on the weekends, so I didn't really know how many hours I work in a week. Out of curiousity more than anything else, this past week I kept track: I worked 53½ hours.
Maureen has also been wondering about whether she is working hard enough. By her rule of thumb (you are working hard enough if you think about work when you are doing other things), I am working plenty hard.
Something I could learn from Damien: not working when it is not productive. When we worked together, he had an amazing ability to get up from his desk in the middle of the day and say, "Yeah, I'm not getting anything done, I'm going to go shoot some hoops," and leave. I was his manager at the time, and there was a general anxiety among my managers that not enough was getting done. So the few times he did this, I was both impressed and annoyed at the same time (they say the essence of management is being able to keep two conflicting thoughts in your head at once).
Damien was one of the more productive engineers, and we've all had times when nothing was flowing. Leaving was probably the best thing to do, but it didn't make casual observers think, "There's a productive engineer!" Of course, casual observation is not the way to gauge if someone is productive, but it doesn't stop people from doing it. The other engineer reporting to me at the time was Nosh, a smart guy who had the habit of taking a nap after lunch, another habit which helped productivity, but didn't look it.
I now find myself as engineering manager, a position in which I will have to gauge others' (and my own) productivity. My engineers are another challenging bunch, by which I mean, they are people. The longer I work at writing software, the more I come to appreciate that people are the hardest thing to figure out.
I've watched The Simpsons for a number of years now, and I love it. I'm amazed at how they manage to maintain a huge stable of characters, and bring them in when needed for a gag, sometimes just a cameo per episode. They are crazy off-the-wall characters, but somehow fit together into a logical universe of humor. This page lists 274 characters alphabetically, and the List of Characters from the Simpsons page at Wikipedia has nearly 500 (take that, Britannica!), categorized by part of town.
Yet with all of those characters, a really obvious possibility has been overlooked: Homer's boss. He works at the nuclear plant, but seems to report directly to Mr. Burns. The sitcom has a long history of hapless bosses alternately lording it over the working class hero of the show, and cowering beneath the glare of the powerful head of the company. With all the characters in The Simpsons, how could they have left out a two-faced middle manager?
I always knew Ben Folds was cool. Now I know just how cool: One, he does his own laundry in a laundromat while on tour. And two, he'll put A cabbie on stage because he plays a mean harmonica.
Keri Smith has a handy reference chart for How to Feel Miserable as an Artist. I am not an artist. But Keri's advice isn't limited just to artists. Much of it is good for anyone. For example, replace "artist" with "person", and "patron" with "boss", and you've got a pretty good list.
That's an interesting "variant". Three of the four components are different than the LAMP acronym. They've changed Linux Apache MySQL PHP to Linux LigHTTPD PostgreSQL RubyOnRails, from LAMP to LLPR.
LAMP has changed its meaning from four particular technologies (and the P was always ambiguous about which of the P languages it was referring to), to mean, "an open-source stack of web technologies".
This is the funniest Dilbert I've read in a long time, combining two inexplicable phenomena, marketing and the biathlon.
I don't know what Japanese TV show this is from, but it's a video of some amazing Rube Goldberg contraptions. Skip about 1:30 into it, and when the dancers come on, skip over them too. There are a couple of dozen different devices demonstrated, using some truly innovative interactions and reactions. It's similar to the Honda Cog advertisement, but clearly hand-made, and with a repeating jingle at the end of each one, sometimes incorporated into the machine itself.
Spirit, the Mars explorer expected to last for 90 days, but still going after more than two years has lost the use of one of its six wheels. Remarkably, it is still going anyway. Mission directors are now working on figuring out where the little rover can best spend the winter to make the best use of scarce sunlight to power its batteries. In a quote that typifies the optimistic can-do attitude of the entire mission, mission planner Ray Arvidson said of the dead wheel dragging along the terrain,
For Ben's birthday this year, we made two cakes, one on the day, and one for the party. First was a Stickfus cake:
Stickfus is a world Ben created, populated by stick figures. Bejoq is Ben's self-assigned name on the world.
The party cake was in the shape of Lego pieces, at Ben's request:
Both cakes were really easy, and Ben loved them. Funny thing is, we made a Lego cake just like this one for Max when he was about the same age! The colors of both cakes are a bit off. The stickfus cake is supposed to be skin colored, and we wanted to make the Lego pieces primary colors, but frosting always comes out too pastel. Anyone know how to get good saturated colors in frosting?
Just to make sure everyone understands: yesterday's post declaring Ruby on Rails the winner in the web framework wars was a joke. I hope no one thinks I might choose between two competing technologies because one's version number is numerically greater than the other's, even in part. As far as I'm concerned, none of the reasons I gave yesterday is a good reason to choose a technology. And I still love Python's significant whitespace.
I'm sure Ruby on Rails is deserving of most if not all of its current hype. I've never used it. In truth, my opinion of the different frameworks is that I don't really have one. I'm a Python guy, so I use a Python framework. I've tried Django and Turbogears, and both seemed like excellent pieces of software. Until I started my current job using Django, my only exposure to both was to complete their tutorials. Web.py seems lean and hip, but I know nothing about it.
Now that I am a full-time professional Django user, I like it very much, but I can't even remember how it differs from Turbogears, except for the URL mapper in Django, which I always liked better than the object mapping technique Turbogears inherited from CherryPy.
So I use Django, but I had no part in choosing it at work, the code base was well established by the time I got there. I'd love to have an in-depth knowledge of the different frameworks, and a cogent and meaningful comparison of Ruby on Rails to the other technologies, but I don't. Right now I'm quite well occupied learning all the corners in Django, and building a product.
Hot on the heels of the news about Google's China censorship comes this bad idea: Google Circles. It shows you what others in your demographic are searching for. This to me seems like the ultimate naive mistake: some techno-weenies throw together a search data mash-up because they can, with no thought about how the data could be misused in the real world. I imagine some White House staffers would be embarrassed if people knew they were searching for WMD.
Maybe once Damien gets there, he can help straighten them out.
Recently, a lot of people have asked my opinion of the various web frameworks out there, especially Django, Turbogears, and Ruby on Rails. I've looked into all three. I've been using Django at work for three months.
After carefully looking them all over, I've decided that Ruby on Rails is the winner, for a few reasons:
So it's on to Ruby for me, although I've also heard good things about Erlang...