Mac un-installs

Monday 7 November 2016

The Mac is a nice machine and operating system, but there's one part of the experience I don't understand: software installation and uninstallation. I'm sure the App Store is meant to solve some of this, but the current situation is oddly manual.

Usually when I install applications on the Mac, I get a .dmg file, I open it, and there's something to copy to the Applications folder. Often, the .dmg window that opens has a cute graphic as a background, to encourage me to drag the application to the folder.

Proponents of this say, "it's so simple! The whole app is just a folder, so you can just drag it to Applications, and you're done. When you don't want the application any more, you just drag the application to the Trash."

This is not true. Applications may start self-contained in a folder, but they write data to other places on the disk. Those places are orphaned when you discard the application. Why is there no uninstaller to clean up those things?

As an example, I was cleaning up my disk this morning. Grand Perspective helped me find some big stuff I didn't need. One thing it pointed out to me was in a Caches folder. I wondered how much stuff was in folders called Caches:

sudo find / -type d -name '*Cache*' -exec du -sk {} \; -prune 2>&-

(Find every directory with 'Cache' in its name, show its disk usage in Kb, and don't show any errors along the way.) This found all sorts of interesting things, including folders from applications I had long ago uninstalled.

Now I could search for other directories belonging to these long-gone applications. For example:

sudo find / -type d -name '*TweetDeck*' -exec du -sh {} \; -prune 2>&-
 12K    /Users/ned/Library/Application Support/Fluid/FluidApps/TweetDeck
 84K    /Users/ned/Library/Caches/com.fluidapp.FluidApp.TweetDeck
 26M    /Users/ned/Library/Containers/com.twitter.TweetDeck
1.7M    /Users/ned/Library/Saved Application State/com.fluidapp.FluidApp.TweetDeck.savedState
sudo find / -type d -name '*twitter-mac*' -exec du -sh {} \; -prune 2>&-
288K    /private/var/folders/j2/gr3cj3jn63s5q8g3bjvw57hm0000gp/C/com.twitter.twitter-mac
 99M    /Users/ned/Library/Containers/com.twitter.twitter-mac
4.0K    /Users/ned/Library/Group Containers/

That's about 128Mb of junk left behind by two applications I no longer have. In the scheme of things, 128Mb isn't that much, but it's a lot more disk space than I want to devote to applications I've discarded. And what about other apps I tried and removed? Why leave this? Am I missing something that should have handled this for me?

One of Them

Thursday 3 November 2016

I have not written here about this year's presidential election. I am as startled, confused, and dismayed as many others about how Trump has managed to con people into following him, with nothing more than bluster and lies.

It feels enormous to take it on in writing. Nathan Uno also feels as I do, but for different reasons. I've never met Nathan: he's an online friend, part of a small close-knit group who mostly share a religious background, and who enjoy polite intellectual discussions of all sorts of topics. I'm not sure why they let me in the group... :)

Nathan and I started talking about our feelings about the election, and it quickly became clear that he had a much more visceral reason to oppose Trump than I did. I encouraged him to write about it, and he did. Here it is, "One of Them."

•    •    •

One of Them

Armed police came in the middle of the night and in the middle of winter, to take a husband away from his wife and a father away from his children. No explanation was given and his family was not allowed to see him or even know where he was being held. A few months later the man’s wife and children were also rounded up and taken away. They had only the belongings that they could carry with them, leaving everything else to be lost or stolen or claimed by others, including some of the family’s most precious possessions. The family was imprisoned in a camp surrounded by barbed wire and armed soldiers. They had little food and little heat and absolutely no freedom. A few months after the wife and children arrived they were finally reunited with their husband and father, seven months after he was taken from them in the night. They remained together at the camp for years until being released, given $25 and a bus ticket each, and left to try to put their shattered lives back together.

No member of the family was ever charged with a crime. In fact, no member of the family was ever even suspected of a crime. They were imprisoned, along with tens of thousands of others, simply for being “one of them.”

This is the story of my grandfather’s family. And my grandmother’s family. And tens of thousands of other families of Japanese descent who had the misfortune of living on the Pacific coast of the United States after the attack on Pearl Harbor.

In the 1980s the U.S. government formally apologized, acknowledging their mistake, and financial reparations were made. Growing up I believed that we, as a country, had moved on, had learned a lesson. It never occurred to me that such a thing could happen again. And yet here we are, with a presidential candidate who has openly advocated violence against his opponents and detractors, offered to pay legal fees for those who break the law on his behalf, recommended policies that would discriminate against people based on their ethnicity, religion, or country of ancestry, suggested that deliberately killing women and children might be an appropriate response to terrorism, and yes, even said that he “might have” supported the policies that imprisoned my family.

Xenophobic public policy leaves enduring scars on our society, scars that may not be obvious at first. We have Chinatowns today largely because public policy in San Fransisco in the late 1800s pushed Chinese immigrants to a live in a specific neighborhood. The proliferation of Chinese restaurants and Chinese laundries in our country can be traced back to the same time period, when policy restricted employment opportunities for Chinese immigrants and pushed them into doing low-paying “women’s work," like cooking and cleaning.

I’ve chosen to make my point with these simple examples from the history of Asian Americans because that’s my heritage. But these examples are trivial compared to the deep, ugly scars left on our society by slavery, and Jim Crow, and the near genocide of the Native American peoples. And despite many positive gains, women continue to be at a significant disadvantage from millennia of policies designed to keep “them” from being on equal footing with “us."

But the real danger of Donald Trump isn’t that he, himself, is a xenophobe and threatens to enact xenophobic policy. The danger is that Trump rallies xenophobes, and justifies and condones their behavior and attitudes. The harsh, unfair internment of my family during World War II was only the beginning of decades of discrimination and abuse. Members of my family were spat upon and threatened and passed over for employment and educational opportunities. And they were the lucky ones — other Japanese Americans were shot at and had their homes set on fire.

In 1945, four men were accused of causing an explosion and a fire on the property of the Doi family, who had recently returned from Colorado’s Grenada internment camp. One of the men confessed and implicated the others. At trial, their lawyer simply argued that “this is a white man’s country” and that his clients’ actions were necessary to keep it that way. All four men were acquitted by the jury, a jury doubtless influenced by the fact that the federal government had chosen to imprison the Doi family for years. The federal government declared them to be a danger simply because of their Japanese heritage, a declaration that was used to justify violence.

And we’re seeing the same again today: violence at Trump’s rallies and by some of Trump’s supporters. Violence that is either condoned or ignored by Donald Trump. My wife is not an American, nor is the rest of her family who currently reside in the United States. I am not white, nor is the rest of my family, which means that my children aren’t white either. We have family members of various ethnicities and friends of different ethnicities and religions. Donald Trump’s rhetoric and proposed policies pose an existential threat to myself, my family, and a number of our friends. But Donald Trump’s supporters may pose a physical threat to our collective safety.

While it worries me that, at the time of this writing, FiveThirtyEight puts Donald Trump’s chances of winning at somewhere around 33%, what I simply cannot fathom is their prediction that roughly 45% of the American public will choose to vote for Donald Trump. 45% of Americans apparently consider themselves to be “one of us," and seem unconcerned about what might happen to “them." If you are still reading this you may not be one of those people. But if you are considering voting for Donald Trump, or know others who are, I implore you to carefully consider your decision.

Donald Trump does not deserve your support, because he is not on your side. He does not share your ideology. He does not support your viewpoints in any meaningful way. Donald Trump is many things, but more than anything he’s an opportunist. His pursuit of the presidency is about his own self interest, whether that be feeding his ego or preparing for his next set of business schemes. It’s not about what’s best for you, or for the country.

Perhaps you’re a Republican and believe that your party’s interests are of paramount importance. Donald Trump is not a champion of your party’s interests - he is an opportunist who only cares about his own interests. He does not hold to the Republican party line, has attacked key members of your leadership, and is actively dividing, and possibly destroying, your party right now. A vote for Donald Trump isn’t a vote to save the Republican party, it’s a vote for the destruction of the Republican party so that one man can promote his own public persona and guarantee himself the attention he so desperately craves.

Perhaps you’re a Christian and believe the Christian leaders who’ve told you that Trump is the right choice for Christians. Donald Trump is not a defender of the Christian faith - he is an opportunist interested only in defending his own fame and expanding his power and influence. His behavior is consistently antithetical to Christian values and he has shown a dramatic lack of understanding of Christ and the Bible. A vote for Donald Trump isn’t a vote to protect Christian values, it’s a vote to protect the personal interests and appalling lack of character of a man whose behavior is entirely un-Christ-like.

Perhaps you’re pro-life and believe that the sanctity of human life must take precedence over all other issues. Donald Trump doesn’t care about the sanctity of human life - he is an opportunist who puts the sanctity of his own life above all others, and is happy to look out for the lives of those who support him, but cares not about the lives of those who oppose him. A man who openly advocates the murder of the wives and children of suspected terrorists does not care about the sanctity of a pregnant woman’s life or the sanctity of the life of that woman’s unborn child. Trump has no real plans to end abortion. In fact, if you look carefully, you can find the week in his campaign where he changed his position on abortion five different times, carefully experimenting to find the position that would gain him the most support. A vote for Donald Trump isn’t a vote to protect the sanctity of human life, it’s a vote that protects the idea that “our” lives matter and “their” lives don’t.

Perhaps you’re concerned about the threat of terrorism and value the safety of our country over all other concerns. Donald Trump is not interested in defusing the threat of terrorism - he is an opportunist who can’t wait to exercise more power than he’s ever had before. His approach to guaranteeing the “safety” of our nation is to abandon our allies, pulling out of strategic partnerships like NATO, and ramp up the level of violence against terrorists and “terrorist nations." He has openly talked about attacking countries in the Middle East simply to seize their oil, without any regard to how that might affect America’s relationship with other nations or encourage additional forms of terrorism. A vote for Donald Trump isn’t a vote to fight the growing threat of terrorism, it’s a vote to give dangerous amounts of power to a man committed to wielding that power to fight whomever he sees as an opponent, regardless of the consequences or the impact on others.

Perhaps you’ve faced economic hardship for some time and you hope that he will provide you with more financial or job security. Donald Trump is unconcerned with your economic security - he is an opportunist who is concerned only with his own economic security. He doesn’t want you to see his tax returns because he doesn’t want you to see how much he’s earned while you’ve suffered, or how many taxes he’s avoided paying while you’ve been struggling to pay yours. He’s been consistently accused of refusing to pay people for work that they’ve done on his behalf. A vote for Donald Trump isn’t a vote to improve the prosperity of the working class, it’s a vote to improve the prosperity of Donald Trump, perhaps not in the short term, but certainly in the long term.

Or perhaps you have an entirely different reason for voting for Trump. Regardless of your reason, Trump is not on your side. He is an opportunist, and nothing more. It’s possible that you might benefit if your interests are directly aligned with his, but please consider the many many lives that may be negatively impacted along the way, and understand that Trump has a history of taking people from the “us” category and putting them into the “them” category at the slightest provocation. A vote for Trump is a vote guaranteed only to benefit Donald Trump. Others might benefit, but only as a secondary effect to the benefits gained by Donald Trump.

To be clear: I am not a fan of Hillary Clinton. Or of Bill Clinton. Or of the Democratic party, or of their policies. I disagree with many so-called “liberal” viewpoints. The prospect of Hillary Clinton as a president is not at all ideal from my perspective. But that prospect does not fill me with fear, and so I will be obliged, for the first time in my life, to cast a vote for the Democratic party’s candidate for president. I implore you to carefully consider doing the same.

Multi-parameter Jupyter notebook interaction

Saturday 29 October 2016

I'm working on figuring out retirement scenarios. I wasn't satified with the usual online calculators. I made a spreadsheet, but it was hard to see how the different variables affected the outcome. Aha! This sounds like a good use for a Jupyter Notebok!

Using widgets, I could make a cool graph with sliders for controlling the variables, and affecting the result. Nice.

But there was a way to make the relationship between the variables and the outcome more apparent: choose one of the variables, and plot its multiple values on a single graph. And of course, I took it one step further, so that I could declare my parameters, and have the widgets, including the selection of the variable to auto-slide, generated automatically.

I'm pleased with the result, even if it's a little rough. You can download retirement.ipynb to try it yourself.

The general notion of a declarative multi-parameter model with an auto-slider is contained in a class:

%pylab --no-import-all inline

from collections import namedtuple

from ipywidgets import interact, IntSlider, FloatSlider

class Param(namedtuple('Param', "default, range")):
    A parameter for `Model`.
    def make_widget(self):
        """Create a widget for a parameter."""
        is_float = isinstance(self.default, float)
        is_float = is_float or any(isinstance(v, float) for v in self.range)
        wtype = FloatSlider if is_float else IntSlider
        return wtype(
            min=self.range[0], max=self.range[1], step=self.range[2], 

class Model:
    A multi-parameter model.

    output_limit = None
    num_auto = 7
    def _show_it(self, auto_param, **kw):
        if auto_param == 'None':
            plt.plot(self.inputs,, **kw))
            autop = self.params[auto_param]

            auto_values = np.arange(*autop.range)
            if len(auto_values) > self.num_auto:
                lo, hi = autop.range[:2]
                auto_values = np.arange(lo, hi, (hi-lo)/self.num_auto)
            for auto_val in auto_values:
                kw[auto_param] = auto_val
                output =, **kw)
                plt.plot(self.inputs, output, label=str(auto_val))
            plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
        if self.output_limit is not None:

    def interact(self):
        widgets = {
            name:p.make_widget() for name, p in self.params.items()
        param_names = ['None'] + sorted(self.params)
        interact(self._show_it, auto_param=param_names, **widgets)

To make a model, derive a class from Model. Define a dict called params as a class attribute. Each parameter has a default value, and a range of values it can take, expressed (min, max, step):

class Retirement(Model):
    params = dict(
        invest_return=Param(3, (1.0, 8.0, 0.5)),
        p401k=Param(10, (0, 25, 1)),
        retire_age=Param(65, (60, 75, 1)),
        live_on=Param(100000, (50000, 150000, 10000)),
        inflation=Param(2.0, (1.0, 4.0, 0.25)),
        inherit=Param(1000000, (0, 2000000, 200000)),
        inherit_age=Param(70, (60, 90, 5)),

Your class can also have some constants:

start_savings = 100000
salary = 100000
socsec = 10000

Define the inputs to the graph (the x values), and the range of the output (the y values):

inputs = np.arange(30, 101)
output_limit = (0, 10000000)

Finally, define a run method that calculates the output from the inputs. It takes the inputs as an argument, and also has a keyword argument for each parameter you defined:

def run(self, inputs, 
    invest_return, p401k, retire_age, live_on,
    inflation, inherit, inherit_age
    for year, age in enumerate(inputs):
        if year == 0:
            yearly_money = [self.start_savings]
        inflation_factor = (1 + inflation/100)**year
        money = yearly_money[-1]
        money = money*(1+(invest_return/100))
        if age == inherit_age:
            money += inherit
        if age <= retire_age:
            money += self.salary * inflation_factor *(p401k/100)
            money += self.socsec
            money -= live_on * inflation_factor

    return np.array(yearly_money)

To run the model, just instantiate it and call interact():


You'll get widgets and a graph like this:

Jupyter notebook, in action

There are things I would like to be nicer about this:

  • The sliders are a mess: if you make too many parameters, the slider and the graph don't fit on the screen.
  • The values chosen for the auto parameter are not "nice", like tick marks on a graph are nice.
  • It'd be cool to be able to auto-slide two parameters at once.
  • The code isn't packaged in a way people can easily re-use.

I thought about fixing a few of these things, but I likely won't get to them. The code is here in this blog post or in the notebook file if you want it. Ideas welcome about how to make improvements.

BTW: my retirement plans are not based on inheriting a million dollars when I am 70, but it's easy to add parameters to this model, and it's fun to play with...

A failed plugin

Saturday 22 October 2016

A different kind of story today: a clever test runner plugin that in the end, did not do what I had hoped.

At edX, our test suite is large, and split among a number of CI workers. One of the workers was intermittently running out of memory. Something (not sure what) lead us to the idea that TestCase objects were holding onto mocks, which themselves held onto their calls' arguments and return values, which could be a considerable amount of memory.

We use nose (but plan to move to pytest Real Soon Now™), and nose holds onto all of the TestCase objects until the very end of the test run. We thought, there's no reason to keep all that data on all those test case objects. If we could scrub the data from those objects, then we would free up that memory.

We batted around a few possibilities, and then I hit on something that seemed like a great idea: a nose plugin that at the end of a test, would remove data from the test object that hadn't been there before the test started.

Before I get into the details, the key point: when I had this idea, it was a very familiar feeling. I have been here many times before. A problem in some complicated code, and a clever idea of how to attack it. These ideas often don't work out, because the real situation is complicated in ways I don't understand yet.

When I had the idea, and mentioned it to my co-worker, I said to him, "This idea is too good to be true. I don't know why it won't work yet, but we're going to find out." (foreshadowing!)

I started hacking on the plugin, which I called blowyournose. (Nose's one last advantage over other test runners is playful plugin names...)

The implementation idea was simple: before a test runs, save the list of the attributes on the test object. When the test ends, delete any attribute that isn't in that list:

from nose.plugins import Plugin

class BlowYourNose(Plugin):

    # `test` is a Nose test object. `test.test` is the
    # actual TestCase object being run.

    def beforeTest(self, test):
        test.byn_attrs = set(dir(test.test))

    def afterTest(self, test):
        obj = test.test
        for attr in dir(obj):
            if attr not in test.byn_attrs:
                delattr(obj, attr)

By the way: a whole separate challenge is how to test something like this. I did it with a class that could report on its continued existence at the end of tests. Naturally, I named that class Booger! If you are interested, the code is in the repo.

At this point, the plugin solved this problem:

class MyLeakyTest(unittest.TestCase):
    def setUp(self):
        self.big_thing = big_thing()

    def test_big_thing():
        self.assertEqual(self.big_thing.whatever, 47)

The big_thing attribute will be deleted from the test object once the test is over, freeing the memory it consumed.

The next challenge was tests like this:

def test_directory_handling(self, mock_listdir):
    blah blah ...

The patch decorator stores the patches on an attribute of the function, so I updated blowyournose to look for that attribute, and set it to None. This nicely reclaimed the space at the end of the test.

But you can see where this is going: as I experiment with using the plugin on more and more of our test suite, I encounter yet-more-exotic ways to write tests that exceed the capabilities of the plugin. Each time, I add more logic to the plugin to deal with the new quirk, hoping that I can finally deal with "everything."

We use ddt for data-driven tests like this:

class FooTestCase(unittest.TestCase):

    @data(3, 4, 12, 23)
    def test_larger_than_two(self, value):

This turns one test method into four test methods, one for each data value. When combined with @patch, it means that we can't clean up the patch when one method is done, we need to wait until all the methods are done. But we don't know which is the last. To deal with this, the plugin sniffs around for indications that ddt is being used, and defers the cleanup until the entire class is done.

But then comes test case inheritance:

class BaseTest(unittest.TestCase):
    __test__ = False

    def test_something(self, something):

class Setting1Test(BaseTest):
    __test__ = True

    def setUp(self):
        self.setting = 1

class Setting2Test(BaseTest):
    __test__ = True

    def setUp(self):
        self.setting = 2

Now we have patches on generated methods, and even the end of the class is too early to clean up, because there are other classes using them later. We have no way to know when it is safe to clean up, except at the very end of all the tests. But the whole point was to reclaim memory sooner than that.

So the good news is, I was right: there were reasons my simple brilliant idea wasn't going to work. The bad new is, I was right. This is so typical of this kind of work: it's a simple idea, that seems so clearly right when you are in the shower, or on your bike, or swimming laps. Then you get into the actual implementation and all the real-world complexity and twistiness reveals itself. You end up in a fun-house of special cases. You chase them down, thinking, "no problem, I can account for that," and maybe you can, but there are more creepy clowns around the next corner, and chances are really good that eventually one will be too much for your genius idea.

In this case, just to top it off, it turns out the memory problem in our test suite wasn't about long-lived mocks at all. It was due to Django 1.8 migrations consuming tons of memory, and the solution is to upgrade to 1.9 (someday...). Sigh.

One of the challenges of software engineering is remaining optimistic in the face of boss battles like this. Occasionally a simple genius idea will work out. Sometimes, solving 90% of the problem is a good thing, and the other 10% can remain unsolved. Even total losses like blowyournose are good experience, good learning exercises.

And the next idea will be better!

Computing primes with CSS

Thursday 29 September 2016

I've been working on a redesign of this site, so doing more CSS, finally internalizing Sass, etc. During my reading, the nth-child pseudo-class caught my eye. It's oddly specific, providing syntax like "p:nth-child(4n+3)" to select every fourth paragraph, starting with the third. It isn't an arbitrary expression, it has to be of the form An+B, where A and B are integers, possibly negative. An element is selected if it is the An+B'th child of its parent, for some value of n ≥ 0.

It struck me that this is just enough computational power to compute primes with a Sieve of Eratosthenes, so I whipped up an demonstration (see it live here):

/* A stupid pet trick by Ned Batchelder @nedbat */
html { max-width: 40rem; }
span { display: inline-block; width: 2em; text-align: right; }
span:first-child { display: none; }

The code has only linear sequences of numbers. There are spans for 1 through 999, the candidate numbers. These are arranged so that the number N is the Nth child of their containing div. The CSS has nth-child styles for 2 through 32, the possible factors.

The Sieve will hide numbers that are determined not to be primes with a "display: none" rule. A first-child selector hides 1, which is typical, seems like you always have to treat 1 specially when looking for primes. The other selectors for the display:none rule select the multiples of each number in turn. "nth-child(2n+4)" will hide elements 4, 6, 8, and so on. "nth-child(3n+6)" will hide 6, 9, 12, and so on.

So CSS has two features that together are just enough to implement the Sieve. The nth-child selector accomplishes the marking of factors. The overlapped application of separate rules implements the multiple passes, one for each factor.

Of course, I didn't write this file by hand, I wrote a Python program to do it. It's pretty simple, I won't clog up this post with the whole thing. But, it was my first use of a new feature in Python 3.6: f-strings. The loop that writes the nth-child selectors looks like this:

for i in range(2, 33):

The f"" string has curly-bracketed expressions in it which are evaluated in the current scope. This string in Python 3.6:


is equivalent to this in previous Pythons:

"span:nth-child({i}n+{i2})".format(i=i, i2=2*i)

It felt really natural to use this new feature, and really convenient.

Don't follow me on Instagram

Monday 5 September 2016

This summer I started taking pictures and posting them on Instagram. It started with a conversation with my son Max, and his assertion that posting more than one picture a day on Instagram was Instaspam. That constraint appealed to me. I like the idea of photography as a way of attending to what I am seeing. So I started trying to look around me to find interesting shots for Instagram posts.

My summer has been different than I expected, so I've had chances to look around places I didn't expect to be. Ironically, thinking about what can go on Instagram can be a way to focus on the here-and-now. You have to see what is immediately around you in order to get a shot.

Normally, thinking about stuff to post on a social network can be the furthest thing from being in the moment. You're thinking about how people will react to your tweet, or who will look at your Facebook status. It's easy to fall into second-guessing what will get the biggest response. You spend time going back to look at what happened to your recent activity.

I have mixed stances toward different social media. I like Twitter, and like having followers. I want my tweets to get widely retweeted. I ignore Facebook, except to find out what my sons are up to. Pinterest and Snapchat might as well not exist. Now I'm putting pictures on Instagram, but not to get followers or tons of likes. The photos have no message, I rarely put any words on them. If I can post a picture I like, and have one other person like it, that's enough.

If you want to follow someone good on Instagram, my brother is an actual photographer who knows what he is doing. Follow him!

Walks in the morning

Thursday 25 August 2016

The summer is wrapping up, and it's been a strange one. On July 4th weekend, we discovered a serious bruise on Nat's chest. We took him to the emergency room to have it properly documented so we could make a formal investigation. The doctor there told us that Nat had a broken rib, and what's more, he had another that had healed perhaps a year ago.

Nat is 26, and has autism. We tried asking him what had happened, but his reports are sketchy, and it's hard to know how accurate they are. We moved him out of his apartment, and back home with us. We ended his day program. He'd had a good experience at a camp in Colorado a few years ago, so we sent him back there, which was expensive, and meant two Colorado trips for us.

The investigation has not come up with any answers. A year ago, he had been acting oddly, very still and reluctant to move. Then, we couldn't figure out why, but now we know: he had a broken rib.

We've found a new day program for Nat which seems really good. It starts full-time on Monday. During the last month, we've been cobbling together things for Nat to do during the day. He has a lot of energy and likes walking, so I've switched my exercise from swimming to doing early-morning walks with Nat before work.

Parenting is not easy. No matter what kind of child(ren) you have, there are challenges. You have to understand their needs, decide what you want for them, and try to make a match. You have to include them in the many forces that shape your days and your life.

This summer has been a challenge that way, figuring out how to fit this complicated man into our day. The walks have been something Nat and I do together, one of the few things we both enjoy. I'll be glad to be back to my swimming routine, but I'm also glad to have had this expansion of our walking together, something that used to only happen on weekends.

Nat, walking

We still have to find a place for Nat to live, and we have to make sure the day program takes hold in a good way. I know this is not that last time Nat will need our energy, worry, and attention, and I know we won't always know when those times are coming. This is what it means to be his parent. He needs us to plan and guide his life.

And he needs to walk in the morning.

Lists vs. Tuples

Thursday 18 August 2016

A common beginner Python question: what's the difference between a list and a tuple?

The answer is that there are two different differences, with complex interplay between the two. There is the Technical Difference, and the Cultural Difference.

First, the things that are the same: both lists and tuples are containers, a sequence of objects:

>>> my_list = [1, 2, 3]
>>> type(my_list)
<class 'list'>
>>> my_tuple = (1, 2, 3)
>>> type(my_tuple)
<class 'tuple'>

Either can have elements of any type, even within a single sequence. Both maintain the order of the elements (unlike sets and dicts).

Now for the differences. The Technical Difference between lists and tuples is that lists are mutable (can be changed) and tuples are immutable (cannot be changed). This is the only distinction that the Python language makes between them:

>>> my_list[1] = "two"
>>> my_list
[1, 'two', 3]
>>> my_tuple[1] = "two"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

That's the only technical difference between lists and tuples, though it manifests in a few ways. For example, lists have a .append() method to add more elements to the list, while tuples do not:

>>> my_list.append("four")
>>> my_list
[1, 'two', 3, 'four']
>>> my_tuple.append("four")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'append'

Tuples have no need for an .append() method, because you can't modify tuples.

The Cultural Difference is about how lists and tuples are actually used: lists are used where you have a homogenous sequence of unknown length; tuples are used where you know the number of elements in advance because the position of the element is semantically significant.

For example, suppose you have a function that looks in a directory for files ending with *.py. It should return a list, because you don't know how many you will find, and all of them are the same semantically: just another file that you found.

>>> find_files("*.py")
["", "", "", ""]

On the other hand, let's say you need to store five values to represent the location of weather observation stations: id, city, state, latitude, and longitude. A tuple is right for this, rather than a list:

>>> denver = (44, "Denver", "CO", 40, 105)
>>> denver[1]

(For the moment, let's not talk about using a class for this.) Here the first element is the id, the second element is the city, and so on. The position determines the meaning.

To put the Cultural Difference in terms of the C language, lists are like arrays, tuples are like structs.

Python has a namedtuple facility that can make the meaning more explicit:

>>> from collections import namedtuple
>>> Station = namedtuple("Station", "id, city, state, lat, long")
>>> denver = Station(44, "Denver", "CO", 40, 105)
>>> denver
Station(id=44, city='Denver', state='CO', lat=40, long=105)
>>> denver[1]

One clever summary of the Cultural Difference between tuples and lists is: tuples are namedtuples without the names.

The Technical Difference and the Cultural Difference are an uneasy alliance, because they are sometimes at odds. Why should homogenous sequences be mutable, but hetergenous sequences not be? For example, I can't modify my weather station because a namedtuple is a tuple, which is immutable:

>>> = 39.7392
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: can't set attribute

And sometimes the Technical considerations override the Cultural considerations. You cannot use a list as a dictionary key, because only immutable values can be hashed, so only immutable values can be keys. To use a list as a key, you can turn it into a tuple:

>>> d = {}
>>> nums = [1, 2, 3]
>>> d[nums] = "hello"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> d[tuple(nums)] = "hello"
>>> d
{(1, 2, 3): 'hello'}

Another conflict between the Technical and the Cultural: there are places in Python itself where a tuple is used when a list makes more sense. When you define a function with *args, args is passed to you as a tuple, even though the position of the values isn't significant, at least as far as Python knows. You might say it's a tuple because you cannot change what you were passed, but that's just valuing the Technical Difference over the Cultural.

I know, I know: in *args, the position could be significant because they are positional parameters. But in a function that's accepting *args and passing it along to another function, it's just a sequence of arguments, none different from another, and the number of them can vary between invocations.

Python uses tuples here because they are a little more space-efficient than lists. Lists are over-allocated to make appending faster. This shows Python's pragmatic side: rather than quibble over the list/tuple semantics of *args, just use the data structure that works best in this case.

For the most part, you should choose whether to use a list or a tuple based on the Cultural Difference. Think about what your data means. If it can have different lengths based on what your program encounters in the real world, then it is probably a list. If you know when you write the code what the third element means, then it is probably a tuple.

On the other hand, functional programming emphasizes immutable data structures as a way to avoid side-effects that can make it difficult to reason about code. If you are a functional programming fan, you will probably prefer tuples for their immutability.

So: should you use a tuple or a list? The answer is: it's not always a simple answer.

Breaking out of two loops

Thursday 4 August 2016

A common question is, how do I break out of two nested loops at once? For example, how can I examine pairs of characters in a string, stopping when I find an equal pair? The classic way to do this is to write two nested loops that iterate over the indexes of the string:

s = "a string to examine"
for i in range(len(s)):
    for j in range(i+1, len(s)):
        if s[i] == s[j]:
            answer = (i, j)
            break   # How to break twice???

Here we are using two loops to generate the two indexes that we want to examine. When we find the condition we're looking for, we want to end both loops.

There are a few common answers to this. But I don't like them much:

  • Put the loops into a function, and return from the function to break the loops. This is unsatisfying because the loops might not be a natural place to refactor into a new function, and maybe you need access to other locals during the loops.
  • Raise an exception and catch it outside the double loop. This is using exceptions as a form of goto. There's no exceptional condition here, you're just taking advantage of exceptions' action at a distance.
  • Use boolean variables to note that the loop is done, and check the variable in the outer loop to execute a second break. This is a low-tech solution, and may be right for some cases, but is mostly just extra noise and bookkeeping.

My preferred answer, and one that I covered in my PyCon 2013 talk, Loop Like A Native, is to make the double loop into a single loop, and then just use a simple break.

This requires putting a little more work into the loops, but is a good exercise in abstracting your iteration. This is something Python is very good at, but it is easy to use Python as if it were a less capable language, and not take advantage of the loop abstractions available.

Let's consider the problem again. Is this really two loops? Before you write any code, listen to the English description again:

How can I examine pairs of characters in a string, stopping when I find an equal pair?

I don't hear two loops in that description. There's a single loop, over pairs. So let's write it that way:

def unique_pairs(n):
    """Produce pairs of indexes in range(n)"""
    for i in range(n):
        for j in range(i+1, n):
            yield i, j

s = "a string to examine"
for i, j in unique_pairs(len(s)):
    if s[i] == s[j]:
        answer = (i, j)

Here we've written a generator to produce the pairs of indexes we need. Now our loop is a single loop over pairs, rather than a double loop over indexes. The double loop is still there, but abstraced away inside the unique_pairs generator.

This makes our code nicely match our English. And notice we no longer have to write len(s) twice, another sign that the original code wanted refactoring. The unique_pairs generator can be reused if we find other places we want to iterate like this, though remember that multiple uses is not a requirement for writing a function.

I know this technique seems exotic. But it really is the best solution. If you still feel tied to the double loops, think more about how you imagine the structure of your program. The very fact that you are trying to break out of both loops at once means that in some sense they are one thing, not two. Hide the two-ness inside one generator, and you can structure your code the way you really think about it.

Python has powerful tools for abstraction, including generators and other techniques for abstracting iteration. My Loop Like A Native talk has more detail (and one egregious joke) if you want to hear more about it. 4.2

Wednesday 27 July 2016 4.2 is done.

As I mentioned in the beta 1 announcement, this contains work from the sprint at PyCon 2016 in Portland.

The biggest change since 4.1 is the only incompatible change. The "coverage combine" command now will ignore an existing .coverage data file, rather than appending to it as it used to do. This new behavior makes more sense to people, and matches how "coverage run" works. If you've ever seen (or written!) a tox.ini file with an explicit coverage-clean step, you won't have to any more. There's also a new "--append" option on "coverage combine", so you can get the old behavior if you want it.

The multiprocessing support continues to get the polish it deserves:

  • Now the concurrency option can be multi-valued, so you can measure programs that use multiprocessing and another library like gevent.
  • Options on the command line weren't being passed to multiprocessing subprocesses. Now they still aren't, but instead of failing silently, you'll get an error explaining the situation.
  • If you're using a custom-named configuration file, multiprocessing processes now will use that same file, so that all the processes will be measured the same.
  • Enabling multiprocessing support now also enables parallel measurement, since there will be subprocesses. This reduces the possibility for error when configuring

Finally, the text report can be sorted by columns as you wish, making it more convenient.

The complete change history is in the source. 4.2 beta 1

Tuesday 5 July 2016 4.2 beta 1 is available.

This contains a few things we worked on during a day of sprinting at PyCon 2016 in Portland. Thanks to my fellow sprinters: Dan Riti, Dan Wandschneider, Josh Williams, Matthew Boehm, Nathan Land, and Scott Belden. Each time I've sprinted on, I've been surprised at the number of people willing to dive into the deep end to make something happen. It's really encouraging to see people step up like that.

What's changed? The biggest change is the only incompatible change. The "coverage combine" command now will ignore an existing .coverage data file, rather than appending to it as it used to do. This new behavior makes more sense to people, and matches how "coverage run" works. If you've ever seen (or written!) a tox.ini file with an explicit coverage-clean step, you won't have to any more. There's also a new "--append" option on "coverage combine", so you can get the old behavior if you want it.

A new option lets you control how the text report is sorted.

The concurrency option can now be multi-valued, if you are using multiprocessing and some other concurrency library, like gevent.

The complete change history is in the source.

This isn't going to be a long beta, so try it now!

Math factoid of the day: 54

Thursday 16 June 2016

54 can be written as the sum of three squares in three different ways:

7² + 2² + 1² = 6² + 3² + 3² = 5² + 5² + 2² = 54

It is the smallest number with this property.

Also, a Rubik's cube has 54 colored squares.


Even older...