« | » Main « | »

A server memory leak

Saturday 27 September 2008

We pushed new code to our production servers last week. There were a lot of changes, including our upgrade to Django 1.0. As soon as the servers restarted, they immediately suffered, with Python processes bloated to 2Gb or more memory each. Yikes! We reverted to the old code, and began the process of finding the leak.

These are details on what we (Dave, Peter, and I, mostly them) did to find and fix the problem.

» read more of: A server memory leak... (34 paragraphs)

Evil apple

Tuesday 23 September 2008

I really don't know what Apple is thinking. First they release a really cool phone, good. Then they release an SDK for it, also good. But developers aren't allowed to talk to each other about developing for the phone. That's bad, doesn't Apple realize how developers learn? Then Apple sets up a store and keeps control over what apps can be sold there. Partly good (no malware can pollute the ecosystem), but partly bad (no one knows how Apple will decide what can be sold).

Then Apple started to reject apps from the app store, which is bad, because app developers only find out they've been rejected after they've expended all the effort to build the app, and it can be hard to predict whether an app will be rejected or not, making it risky to build iPhone apps.

After this breathtaking descent into cluelessness, Apple has topped itself by deciding that app rejections are subject to the non-disclosure, making it illegal for developers to talk about the fact that their app has been rejected! Is Apple actively trying to discourage app development? Is there any other company that could act this way without raising the ire of the development community? This is the company that used Gandhi in an ad? What exactly is Apple thinking?

Honda Civic hybrid

Saturday 20 September 2008

I've just bought a new car: a Honda Civic hybrid. I don't buy cars that often. The car I just replaced was a 1994 Civic. To keep the same pace, I'll add an entry to my calendar for 2022 to buy my next car.

I like the Civic for its gas mileage, 45 mpg highway. The extra expense over a non-hybrid Civic is actually more than I'll save on gas over the life of the car, but I like being the change I want to see in the world.

One thing that surprised me about this car is how familiar it felt after having driven a 1994 Civic. Lots of extra bells and whistles that I'd gotten used to in my wife's larger cars are still absent in this car.

Features in the hybrid I didn't have in my 1994 Civic (other than the hybrid engine):

  • A temperature setting in the climate control
  • Front seat map lights
  • A chime to alert me that I've left my headlights on
  • An auxilliary jack for the stereo
  • Electronic dashboard with thermometer, etc

Things that work in the hybrid that used to work in the 1994 Civic, but no longer do:

  • Remote entry buttons
  • Reliable low-speed wipers
  • Rear left passenger door handle
  • Exhaust system. The last thing that failed on the 94 was the exhaust. For its last two days, it sounded like a four-door Harley.

Fancy features the Hybrid doesn't have that my wife's car does:

  • Motorized seat adjustments with memory
  • Heated seats
  • Lighted mirrors in visors
  • Fold-in side mirrors
  • Leather seats
  • Separate temperature settings for driver and passenger
  • Individual lights for rear passengers

I'm pleased to have a new car that just works, and especially one that does so well on gas.

Competition inside corporations?

Thursday 18 September 2008

Having observed Hewlett-Packard from the inside for almost 18 months now, I'm struck by a paradox: our economy is a chaotic marketplace of capitalist competition, practiced and championed by corporations, but internally, companies are run as top-down, centrally-planned dictatorships. Why is that? Why isn't a company simply a microcosm of the larger economy?

Take the case of IT services: inside HP, there is a large IT organization, and they provide services to the rest of the company. When my group joined HP, we had no choice about how to get, for example, email service. The IT group provided email, and we used it. When we need to buy a laptop, there is one group that provides that service. When we need servers hosted, we have only one place to turn.

I'm sure the reason for this is the efficiency gained by eliminating redundancy. If there were two groups providing email services, surely one group could do the job of both, with less total staff, equipment, and so on.

That's certainly true, but then why don't we apply the same logic to the larger economy? After all, HP's email group has a huge overlap with Dell's, IBM's, Sun's, Microsoft's, and so on. Couldn't our economy gain by eliminating the overlap? When these questions are considered at the national level, we tout the increased efficiency produced by competition. The economy as a whole gains from the pressure competition puts on each company. Without competition, there is no incentive to improve, no reason to do your best. In a centrally-planned nationalized economy, incompetence is not punished, incentives are mis-aligned, and apathy takes over. There's no reason to improve because your customers have nowhere else to turn, poor service will not lead to loss of business, there's no price pressure, and your existence is guaranteed by the state.

That's logic that every capitalist believes, and we laugh at economies that have tried central planning and failed. So why doesn't the same logic hold inside companies? Why are monopolies and lack of competition not just accepted, but enforced? Don't we believe the same forces will be at work? Is there any compelling reason to improve if you have no competition?

Why couldn't a company have three IT groups (call them Red, Green, and Blue). Each is separate, and lives or dies based on their ability to attract business from the rest of the company. When my group needs servers hosted, we shop around. Maybe Red is the deluxe service, and Blue is economy, and we've heard from friends that Green has the best service. For whatever reason, we choose one of them, and spend our internal dollars with them. The groups will compete, and that competition will force them to optimize and find the best solutions for their customers. If they don't, they will go out of business.

I know it seems wasteful to have all that going on inside a company. There will be duplication. But remember the capitalist logic: without competition, there's no reason to do your best. Just as with the larger economy, the duplication will be worth it because of the increased efficiency forced by competition. And without competition, your only option will be a poor one.

Of course, not all work inside corporations could be run this way. For example, legal departments deal with the outside world, and the corporation must speak with one voice there. But couldn't competition be used in at least some parts of large companies?

Where's the flaw in this logic? Why isn't competition inside corporations a good idea?

Self-diagnosing software

Wednesday 17 September 2008

At work we upgraded to the shiny-new Django 1.0, and we had to make a lot of small changes in the process. Most were what you would expect: adapting to the 1.0 way from the older 0.96 code we had been using.

But some of them were undoing ad-hoc patches to Django that we had accreted over the two years we'd been banging away at it. Over the course of a week or so, we'd found dozens of things broken, pointing to work yet to be done to finish the 1.0 upgrade, just as you'd expect. We have a large code base, and many things changed between 0.96 and 1.0.

Yesterday, I couldn't log in on my dev server. Everyone else had been working just fine for the last few days, so it seemed mysterious. I asked our main Django guy Dave for help, and together we logged some session information, saw that there was no session being established at all. He realized what the problem was. "Oh, I changed SESSION_COOKIE_DOMAIN back to a string, we don't use the list any more." Turns out it was one of our ad-hoc Django changes that we threw overboard, and my settings file still had the old setting in it.

This is where the software should have diagnosed itself. If the settings/main.py file had these two lines added to it:

if isinstance(SESSION_COOKIE_DOMAIN, list):
    raise Exception("SESSION_COOKIE_DOMAIN should be just a string now.")

Then I would have immediately gotten an exception on my server console (and browser) pointing to precisely what the problem was. I could have fixed it, and been running in two minutes, rather than being frustrated for half and hour, and bother Dave for another ten minutes.

Our development team is small (five), and all sit next to each other most days of the week, so the cost of this sort of out of band communication about changes to infrastructure is small. Also, I seem to have been the only developer who had a list in their settings file. So perhaps the cost here was a total of about an hour. Not so much, but adding those two lines in the first place would have cost about five minutes. And in addition to the five developers, there are probably five other "development environments" floating around for other purposes: intern work, demos, backups, evaluation tarballs sent to other groups, etc, and who knows if those will have the same problem.

And besides the simple time spent, there's the loss of focus, the distraction of the other developers, the frustration, and so on. Developer attention is a very valuable resource. A speed bump like this in the road is like a CPU cache miss: your pipelines are flushed, and you have to re-focus. The time taken doesn't tell the whole story.

Yesterday was just one of those days, because later, I was entering a zipcode into my dev machine, and was consistently told that there were no facilities near that zipcode, even though I knew there should be.

Turns out that somehow, my database table of zipcodes was empty. We still don't know how that happened, but it would have been great if the software could have helped diagnose this anomalous condition. I changed this:

try:
    z = ZipCode.objects.get(pk=zipcode)
except ZipCode.DoesNotExist:
    raise KeyError

to this:

try:
    z = ZipCode.objects.get(pk=zipcode)
except ZipCode.DoesNotExist:
    if settings.DEBUG:
        # Sometimes the problem isn't one bad zipcode, but that there
        # are no zipcodes in the db at all!
        if ZipCode.objects.all().count() == 0:
            print "*** You have no zipcodes! Run bin/load_zipcodes.py"
    raise KeyError

It would have been another half-hour saved. I don't know how the zipcodes were deleted, so it's hard to guess how often someone will be in this position again, but I know it is worth it to add these sorts of diagnostics. I'll take a guess that the next time the zipcodes are missing will be five minutes before a critical demo, when everyone is panicky and no one will be able to think through the possible causes clearly. An unambiguous diagnostic will be very welcome.

Take the time to make your software self-diagnosing. The more you can automate about the job of writing software, the better your software will be.

Reductio ad absurdum

Thursday 11 September 2008

At work, there are security awareness posters that read,

HP is protected by you

A colleague, in a fit of linguistic pique, railed against the passive voice. He pasted a new message over the poster:

You protect us.

I suggested a more powerful version:

Protect us!

Or even,

Help!

Maybe something got lost along the way...

OpenID is too hard

Friday 5 September 2008

OpenID is one of those web technologies I would love to love: it addresses a need, seems pretty well thought-out, and all the cool kids are doing it. But the fact is, it's still a bit too hard for what it's trying to be. When I first heard about OpenID, I read about it, and didn't quite get it. People kept talking about it, so I kept going back to read about it, and it still mystified me.

Big players started adopting it (AOL, Yahoo), so it seemed like it was here to stay, but I still didn't have the incentive to get over the learning curve. Earlier this week I visited yet another site that encouraged me to get an OpenID, and I decided I would finally cross OpenID off my list of technologies I should at least understand and probably use.

The simplest way to use OpenID is to pick a provider like Yahoo, go to their OpenID page, and enable your Yahoo account to be an OpenID. This in itself was a little complicated, because when I was done, I got to a page that showed me my "OpenID identifiers", which had one item in it:

https://me.yahoo.com/a/.DuSz_IEq5Vw5NZLAHUFHWEKLSfQnRFuebro-

What!? What is that, what do I do with it? Am I supposed to paste that into OpenID fields on other sites? Are you kidding me? Also, in the text on that page is a stern warning:

This step is completely optional. After you choose an identifier, you cannot edit or delete it.

(Emphasis theirs). So now I have a mystifying string of junk, with a big warning all over it that I can't go back. "This step" claims it's optional, but I seem to have already done it! Now I'm afraid, and I'm a technical person — you expect my wife to do this?

Luckily I can choose to enable other identifiers, so I also enable my flickr account as an OpenID.

Since I am a technical person, I've learned that OpenID supports delegation. That's a way to have your website act as an OpenID simply by adding some HTML to your page. The HTML points to another OpenID behind the scenes. That way, I can use nedbatchelder.com as my OpenID, and later be able to change who is actually hosting my OpenID.

Simon Willison shows the simple way to delegate your OpenID on your home page. You need the id you just got from your provider, and you need a URL for the provider's server. Oh, bad news: Yahoo won't say what their server's URL is. I can't delegate to Yahoo. Why? Don't know. Time to get another provider.

So I go to a more savvy provider, get an ID and a delegate server URL, edit my page, and I can't log in to my desired site. I must have messed something up. A good debugging tool for this is to log in to jyte.com. Since it was built by JanRain, the company behind a lot of OpenID, they helpfully provide very geeky error messages if the OpenID login fails for some reason. Turns out I had omitted one place in the HTML that I had to put my user id. Once I fixed that, all was well.

But what have I really gained? Ted Dziuba exuberantly rants about OpenID, since it is why he hates the Internet, and his points are accurate: OpenID is still really difficult, and doesn't gain you that much.

Stefan Brands rounds up lots of issues with OpenID, and I think they need to be taken seriously. OpenID may be one of those Internet technologies that will be fabulous among the savvy and well-intentioned, but falters when spread to the wider population on the web.

Caches aplenty

Thursday 4 September 2008

My laptop has a 100Gb drive, and recently it was 98% or so full! As part of the job of cleaning it up, I used SpaceMonger to see where it the space was going. I noticed a few largish directories whose names indicated they were caches of some sort, and wondered how much disk was being lost to copies of files that I didn't really need to keep around.

I cobbled together this Python script to recursively list the size of folders and files, but only if they exceed specified minimums:

""" List file sizes recursively, but only if they exceed
    certain minimums.
"""

import stat, os

# Minimum size for a file or directory to be listed.
min_file = 10000000
min_dir = 1000000

format = "%15d   %s"
dir_format = "%15d / %s"
err_format = "            !!! ! %s"

def do_dir(d):
    """ Process a single directory, return its total size,
        and print intermediate results along the way.
    """

    try:
        files = os.listdir(d)
    except KeyboardInterrupt:
        raise
    except Exception, e:
        print err_format % str(e)
        return 0

    files.sort()
    total = 0

    for f in files:
        f = os.path.join(d, f)
        st = os.stat(f)
        size = st[stat.ST_SIZE]
        is_dir = stat.S_ISDIR(st[stat.ST_MODE])
        if is_dir:
            size = do_dir(f)
        else:
            if size >= min_file:
                print format % (size, f)
        total += size

    if total >= min_dir:
        print dir_format % (total, d)

    return total

if __name__ == '__main__':
    do_dir(".")

Running this on my disk, and grep'ing for "cache", I came up with this list of cache directories:

       77428736 / .\Documents and Settings\All Users\Application Data\Apple\Installer Cache
      193088296 / .\Documents and Settings\All Users\Application Data\Apple Computer\Installer Cache
      127431856 / .\Documents and Settings\All Users\Application Data\Symantec\Cached Installs
        1283586 / .\Documents and Settings\All Users\DRM\Cache
        8904444 / .\Documents and Settings\batcheln\Application Data\Adobe\CameraRaw\Cache
        3109555 / .\Documents and Settings\batcheln\Application Data\Dropbox\cache
        9141658 / .\Documents and Settings\batcheln\Application Data\Microsoft\CryptnetUrlCache
        6639905 / .\Documents and Settings\batcheln\Application Data\Sun\Java\Deployment\cache
      244047364 / .\Documents and Settings\batcheln\Local Settings\Application Data\Adobe\CameraRaw\Cache
       35706839 / .\Documents and Settings\batcheln\Local Settings\Application Data\Mozilla\Firefox\Profiles\0ou4abpz.default\Cache
        1559441 / .\Documents and Settings\batcheln\Local Settings\Application Data\johnsadventures.com\Background Switcher\FolderQuarterScreenCache
      381984768   .\Documents and Settings\batcheln\My Documents\My Pictures\Lightroom\Lightroom Catalog Previews.lrdata\thumbnail-cache.db
       44671279 / .\Program Files\Adobe\Adobe Help Center\AdobeHelpData\Cache
        1093120 / .\Program Files\Common Files\Microsoft Shared\SFPCA Cache
     1139888470 / .\Program Files\Cyan Worlds\Myst Uru Complete Chronicles\sfx\streamingCache
       73237698 / .\Program Files\Hewlett-Packard\PC COE 3\OV CMS\Lib\Cache
       46559334 / .\WINDOWS\assembly\GAC
       20606686 / .\WINDOWS\assembly\GAC_32
       55143608 / .\WINDOWS\assembly\GAC_MSIL
      105975390 / .\WINDOWS\Driver Cache
       96353450 / .\WINDOWS\Installer\$PatchCache$
        1898024 / .\WINDOWS\SchCache
        1174871 / .\WINDOWS\pchealth\helpctr\OfflineCache
      451465998 / .\WINDOWS\system32\dllcache

(I also included the GAC directories: .net Global Assembly Caches). Summing these sizes, I see that 3 Gb or so of space is occupied by self-declared caches. For many of these I don't know whether it is safe to delete them. Luckily the largest was a game I installed for Max and could completely uninstall.

Windows provides the Disk Cleanup utility, which knows how to get rid of a bunch of stuff you don't really need. Application developers can even write a handler to clean up their own unneeded files, but it seems application developers don't, as I don't have any custom handlers on my machine.

CCleaner is a Windows utility to scrub a little harder at your disk, but even it missed some of these folders: for example, it removed the smaller of the CameraRaw caches (8 Mb), but left the larger (244 Mb). I read online that CameraRaw really doesn't need those files, so I removed them by hand.

I'm all for applications making use of disk space to improve the user experience, but they should do it responsibly: give me a way to see what's being used, and give me a way to get it back. And only keep what makes sense: why do my Apple Installer Cache directories have kits for three different versions each of iTunes, QuickTime, and Safari, and seven kits for Apple Mobile Device Support? Why do I need to keep installers for versions that have already been superceded?

Google chrome

Monday 1 September 2008

The rumors of Google building their own browser are true: Google Chrome is the project, and includes a number of interesting features, introduced in a comic by Scott McCloud. At the top of the list is running every tab in its own process, and also running each plug-in in its own process. That will definitely help with limiting the bleeding when web pages misbehave, as well as with diagnosing the bad component.

Interestingly, one of the lead developers is Ben Goodger, who was a big part of Firefox, but Chrome is based on Webkit rather than Gecko. Chrome claims a number of UI innovations, but they seem fairly simple things to me: where the tabs are, how the auto-complete works, and so on.

For web developers, there's a downside: it's one more browser to worry about. Yes, it's based on Webkit, so in theory it will behave like Safari in all the important ways.

The difference between theory and practice: in theory there is no difference, but in practice there is.

Another downside: this will likely only take market share from Firefox, and not help with the Internet Explorer problem at all...

« | » Main « | »