5.0a6: context reporting

Wednesday 17 July 2019

I’ve released another alpha of 5.0: 5.0a6. There are some design decisions ahead that I could use feedback on.

Important backstory:

  • The big feature in 5.0 is “contexts”: recording data for varying execution context, also known as Who Tests What. The idea is to record not just that a line was executed, but also which tests ran each line.
  • Some of the changes in alpha 6 were driven by a hackathon project at work: using who-tests-what on the large Open edX codebase. We wanted to collect context information, and then for each new pull request, run only the subset of tests that touched the lines you changed. Initial experiments indicate this could be a huge time-savings.

Big changes in this alpha:

  • Support for contexts when reporting. The --show-contexts option annotates lines with the names of contexts recorded for the line. The --contexts option lets you filter the report to only certain contexts. Big thanks to Stephan Richter and Albertas Agejevas for the contribution.
  • Our largest test suite at work has 29k tests. The .coverage SQLite data file was 659Mb, which was too large to work with. I changed the database format to use a compact bitmap representation for line numbers, which reduced the data file to 69Mb, a huge win.
  • The API to the CoverageData object has changed.

Some implications of these changes:

  • The HTML reporting on contexts is good for small test suites, but very quickly becomes unwieldy if you have more than 100 tests. Please try using it and let me know what kind of reporting would be helpful.
  • The new more-compact data file is harder to query. The larger data file has a schema designed to be useful for ad-hoc querying. It was a classic third-normal form representation of the data. Now I consider the database schema to be a private implementation detail. Should we have a new “coverage sql” report command that exports the data to a convenient SQLite file?
  • Because CoverageData has changed, you will need an updated version of pytest-cov if you use that plugin. The future of the plugin is somewhat up in the air. If you would like to help maintain it, get in touch. You can install the up-to-date code with:
    pip install git+
  • To support our hackathon project, we wrote a new pytest plugin: it uses pytest hooks to indicate the test boundaries, and can read the database and the code diff to choose the subset of tests to run. This plugin is in very rough shape (as in, it hasn’t yet fully worked), but if you are interested in participating in this experiment, get in touch. The code is here nedbat/coverage_pytest_plugin. I don’t think this will remain as an independent plugin, so again, if you want to help with future maintenance or direction, let me know.
  • All of our experimentation (and improvements) for contexts involve line coverage. Branch coverage only complicates the problems of storage and reporting. I’ve mused about how to store branch data more compactly in the past, but nothing has been done.

I know this is a lot, and the 5.0 alpha series has been going on for a while. The features are shaping up to be powerful and useful. All of your feedback has been very helpful, keep it coming.

Changelog podcast: me, double-dipping

Saturday 29 June 2019

I had a great conversation with Jerod Santo on the Changelog podcast: The Changelog 351: Maintainer spotlight! Ned Batchelder. We talked about Open edX, and, and maintaining open source software.

One of Jerod’s questions was unexpected: what other open source maintainers do I appreciate? Two people that came to mind were Daniel Hahler and Julian Berman. Some people are well-known in the Python community because they are the face of large widely used projects. Daniel and Julian are known to me for a different reason: they seem to make small contributions to many projects. I see their names in the commits or issues of many repos I wander through, including my own.

This is a different kind of maintainership: not guiding large efforts, but providing little pushes in lots of places. If I had had the presence of mind, I would have also mentioned Anthony Sottile for the same reason.

And I would have mentioned Mariatta, for a different reason: her efforts are focused on CPython, but on the contribution process and tooling around it, rather than the core code itself. A point I made in the podcast was that people and process challenges are often the limiting factor to contribution, not technical challenges. Mariatta has been at the forefront of the efforts to open up CPython contribution, and I wish I had mentioned her in the podcast.

And I am sure there are people I am overlooking that should be mentioned in these appreciations. My apologies to you if you are in that category...

A year of light and dark

Sunday 23 June 2019

Friday was the summer solstice, when day and night are at their most extreme imbalance. It reminded me of the last summer solstice — and the year of light and dark, ups and downs, since then — all revolving around Nat, my 29-year-old autistic son.

Last year on the solstice we were at a block party in a neighborhood in Boston. We had become close friends with a young couple, and planned for Nat to move in with them, and for them to be his caregivers.

The couple was eager to extend their family from two to three. We had had long serious discussions with them about the challenges involved. They knew Nat pretty well, and had deep experience with similar disabled populations. They had even moved to a new apartment in order to have space for Nat.

The new apartment was on this quirky cul-de-sac on a hill, a small tight-knit community, complete with a summer solstice party. It seemed magical, like an entirely new experience opening up to us. Nat would be moving in with a couple his own age, with young enthusiasms, and an anything-is-possible approach to the world. The neighborhood only added to the sense of expanding possibilities. It seemed like a good plan, almost too good to be true.

We planned for Nat to move at the end of August. We spent lots of time with his new caregivers over the summer, doing new things from their world. This helped them understand Nat’s full-day routines better, and was exciting for us.

The move went great. But over the course of the fall, things started not going well. Nat has always had periods of anxiety, but it’s hard to pinpoint the causes. He was going through a bad time, with alarming head-hitting. The caregivers were having health issues of their own, which made it difficult for them to give Nat the routine and stability he needs.

We tried to support the new arrangement by having Nat on weekends, and generally being there for everyone. For reasons that are still not clear to me, it wasn’t enough, and things just kept going downhill, including our interactions with the couple. By March, the arrangement that seemed too good to be true proved to be exactly that. Nat moved back with us.

This was a hard time. Everyone involved reacted in their own ways to the stress, which caused conflict between us and the caregivers. I think Nat overall was happy to be back in our house, but his anxieties had not lessened. Was the change of home part of the cause? We’ll never know.

Parenting Nat has involved a long series of choices for him: where he’ll be schooled, where he will live, what he will do during his days and nights. These choices often fall into two broad categories: the exciting but risky, and the safe but underwhelming — another kind of light and dark. And underlying those decisions is always the impossible question: are we doing enough?

Now that he was back with us, we had to decide where he would go next. He could stay with us permanently, but we know that we are perhaps the least stimulating place for him. We get caught up in our own activities and interests, and he is passive enough that lots of time passes doing nothing. He might be fine with it, but it makes us wonder: are we doing enough?

And looming over all of our planning for him is what will happen at the end of our lives? Now he is 29 and we are 57, but when he is 49 and we are 77 (or 87 or 97!), living together will be a very different story. We want him to have a life separate from us. We think it will be better for him.

The arrangement we had with the couple is known as “shared living,” and we thought about whether we wanted to try that again. Nat had been in two shared living situations by now, and our feeling was that it was too reliant on too few people. We know shared living has worked for other people, but that’s another constant in parenting Nat: just because something works for one autism family doesn’t mean it will work for us. Shared living didn’t seem right for Nat.

We talked with other families we know about what they were planning to do. But most of them had younger guys, or far more resources, or were making decisions on longer timescales than us for other reasons. And honestly, housing together with families we’re already friends with could be like going into business with friends: a good way to strain or ruin the friendship. We didn’t want to do that again.

We asked for a new placement in a group home, figuring we’d take a look at what opened up and see how we felt about it. Two months later we were offered a placement, in a house run by the same organization as Nat’s previous group home that he had moved out of the year before.

The residents are a much better fit with Nat this time, and the staff seems eager and energetic. It’s hard to know whether we are getting accurate answers from Nat when asked his opinion, but he has been nothing but positive about moving to this new house. Being in the same organization means we are familiar with some of the logistics, and Nat will know some of the residents from other houses when they do things together.

Although a group home generally falls into the safe category rather than the risky, it feels like this one might be safe without being underwhelming. We moved him in yesterday, and all seems good. We’ve been through this enough to know that it won’t be perfect. There will be miscommunications with the rotating staff, and he’ll come home wearing another resident’s shirt, but nothing is perfect.

We are still connected to the couple, through other circles. But it is awkward now, because we have never directly talked about the strains from the move-out. I hope that we can do that some day.

As I have said before, I know this is not the last time we will have to make big decisions for Nat. This one feels good, but others have felt good in the past too. I’m optimistic but alert.

Marketing factoid of the day: 57 varieties

Sunday 16 June 2019

The Heinz company has been using the slogan “57 varieties” for more than 120 years. But it was never factual. When they introduced the slogan in 1896, the company already had more than 60 products. The number was chosen for its sound and “psycho­logical effect.”

It’s hard to know the exact number, but today Heinz has thousands of products, including at least 20 ketchups.

BTW, you might be interested in other posts I’ve written on this day in the past.

Corporations and open source: why and how

Friday 7 June 2019

Here’s a really simplistic model: if you want someone to do something, you have to give them a compelling reason to do it, and you have to make it as easy as possible for them to do it. That is, you need to have good answers to Why? and How? (I don’t know much about marketing, but I think these are the value proposition and the call to action.)

Let’s look at the Why and How model as it applies to corporations funding open source. They don’t do it because the answers to Why and How are really bad right now.

Why should a corporation fund open source? As much as I wish it were different for all sorts of reasons, corporations act only in purely selfish ways. In order to spend money, they need to see some positive benefit to them that wouldn’t happen if they didn’t spend the money.

This frustrates me because a corporation is a collection of people, none of whom would act this way. I could say much more about this, but we aren’t going to be able to change corporations.

Companies only spend money if doing so will bring them a (perceived) benefit. Funding open source would make it stronger and better, but that is a very long effect, and not one that accrues directly to the funder. This is the famous Tragedy of the Commons. It’s a fair question for companies to ask: if they fund open source, what do they get for their money?

That’s the difficulty with Why, but let’s imagine for a moment that we could somehow convince someone to spend their company’s money funding open source: now what? How do they do it? A significant Python project could have a hundred library dependencies. How do they decide how to allocate the funding budget among them? Once that decision is made, how does the money get delivered? Very few open source project are equipped to receive funds. If even 10% of the projects have a clear path for funding, now there are 10 checks to write, or 10 PayPal links to click through or whatever? Some of that money will need to be sent internationally, and it has to be considered at tax time. Does it have to be done again next year, and the year after that? It’s a logistical nightmare!

So when we try to convince companies to fund open source, we don’t have good answers for either Why? or How? It’s no wonder it doesn’t happen.

This is one of the reasons I am optimistic about Tidelift: they have good answers for both of these questions. The Tidelift subscription gives companies information and services around their open source dependencies, which answers the why. And the payment to Tidelift solves the how: Tidelift looks at the list of dependencies, decides an allocation, and distributes the money to the maintainers.

Sure, there are still lots of questions to be answered: is the allocation algorithm right? Will enough companies subscribe to make Tidelift itself sustainable? And even larger questions, like: if an interesting amount of money does flow to open source maintainers, what will be the cultural change in open source?

I don’t know the answers to those questions, but Tidelift seems like the most promising answer to how to support open source. I’m an enthusiastic participant. You should be too.

Why Python class syntax should be different

Saturday 25 May 2019

If you’ve used any programming language for a long enough time, you’ve found things about it that you wish were different. It’s true for me with Python. I have ideas of a number of things I would change about Python if I could. I’ll bore you with just one of them: the syntax of class definitions.

But let’s start with the syntax for defining functions. It has this really nice property: function definitions look like their corresponding function calls. A function is defined like this:

def func_name(arg1, arg2):

When you call the function, you use similar syntax: the name of the function, and a comma-separated list of arguments in parentheses:

x = func_name(12, 34)

Just by lining up the punctuation in the call with the same bits of the definition, you can see that arg1 will be 12, and arg2 will be 34. Nice.

OK, so now let’s look at how a class with base classes is defined:

class MyClass(BaseClass, AnotherBase):

To create an instance of this class, you use the name of the class, and parens, but now the parallelism is gone. You don’t pass a BaseClass to construct a MyClass:

my_obj = MyClass(...)

Just looking at the class line, you can’t tell what has to go in the parens to make a MyClass object. So “def” and “class” have very similar syntax, and function calls and object creation have very similar syntax, but the mimicry in function calls that can guide you to the right incantation will throw you off completely when creating objects.

This is the sort of thing that experts glide right past without slowing down. They are used to arcane syntax, and similar things having different meanings in subtly different contexts. And a lot of that is inescapable in programming languages: there are only so many symbols, and many many more concepts. There’s bound to be overlaps.

But we could do better. Why use parentheses that look like a function call to indicate base classes? Here’s a better syntax:

class MyClass from BaseClass, AnotherBase:

Not only does this avoid the misleading punctuation parallelism, but it even borrows from the English we use to talk about classes: MyClass derives from BaseClass and AnotherBase. And “from” is already a keyword in Python.

BTW, even experts occasionally make the mistake of typing “def” where they meant “class”, and the similar syntax means the code is valid. The error isn’t discovered until the traceback, which can be baffling.

I’m not seriously proposing to change Python. Not because this wouldn’t be better (it would), but because a change like this is impractical at this late date. I guess it could be added as an alternative syntax, but it would be hard to argue that having two syntaxes for classes would be better for beginners.

But I think it is helpful to try to see our familiar landscape as confused beginners do. It can only help with explaining it to them, and maybe help us make better choices in the future.


May 20:


Apr 2:

Cog 3.0

Mar 2: