Marketing factoid of the day: 57 varieties

Sunday 16 June 2019

The Heinz company has been using the slogan “57 varieties” for more than 120 years. But it was never factual. When they introduced the slogan in 1896, the company already had more than 60 products. The number was chosen for its sound and “psycho­logical effect.”

It’s hard to know the exact number, but today Heinz has thousands of products, including at least 20 ketchups.

BTW, you might be interested in other posts I’ve written on this day in the past.

Corporations and open source: why and how

Friday 7 June 2019

Here’s a really simplistic model: if you want someone to do something, you have to give them a compelling reason to do it, and you have to make it as easy as possible for them to do it. That is, you need to have good answers to Why? and How? (I don’t know much about marketing, but I think these are the value proposition and the call to action.)

Let’s look at the Why and How model as it applies to corporations funding open source. They don’t do it because the answers to Why and How are really bad right now.

Why should a corporation fund open source? As much as I wish it were different for all sorts of reasons, corporations act only in purely selfish ways. In order to spend money, they need to see some positive benefit to them that wouldn’t happen if they didn’t spend the money.

This frustrates me because a corporation is a collection of people, none of whom would act this way. I could say much more about this, but we aren’t going to be able to change corporations.

Companies only spend money if doing so will bring them a (perceived) benefit. Funding open source would make it stronger and better, but that is a very long effect, and not one that accrues directly to the funder. This is the famous Tragedy of the Commons. It’s a fair question for companies to ask: if they fund open source, what do they get for their money?

That’s the difficulty with Why, but let’s imagine for a moment that we could somehow convince someone to spend their company’s money funding open source: now what? How do they do it? A significant Python project could have a hundred library dependencies. How do they decide how to allocate the funding budget among them? Once that decision is made, how does the money get delivered? Very few open source project are equipped to receive funds. If even 10% of the projects have a clear path for funding, now there are 10 checks to write, or 10 PayPal links to click through or whatever? Some of that money will need to be sent internationally, and it has to be considered at tax time. Does it have to be done again next year, and the year after that? It’s a logistical nightmare!

So when we try to convince companies to fund open source, we don’t have good answers for either Why? or How? It’s no wonder it doesn’t happen.

This is one of the reasons I am optimistic about Tidelift: they have good answers for both of these questions. The Tidelift subscription gives companies information and services around their open source dependencies, which answers the why. And the payment to Tidelift solves the how: Tidelift looks at the list of dependencies, decides an allocation, and distributes the money to the maintainers.

Sure, there are still lots of questions to be answered: is the allocation algorithm right? Will enough companies subscribe to make Tidelift itself sustainable? And even larger questions, like: if an interesting amount of money does flow to open source maintainers, what will be the cultural change in open source?

I don’t know the answers to those questions, but Tidelift seems like the most promising answer to how to support open source. I’m an enthusiastic participant. You should be too.

Why Python class syntax should be different

Saturday 25 May 2019

If you’ve used any programming language for a long enough time, you’ve found things about it that you wish were different. It’s true for me with Python. I have ideas of a number of things I would change about Python if I could. I’ll bore you with just one of them: the syntax of class definitions.

But let’s start with the syntax for defining functions. It has this really nice property: function definitions look like their corresponding function calls. A function is defined like this:

def func_name(arg1, arg2):

When you call the function, you use similar syntax: the name of the function, and a comma-separated list of arguments in parentheses:

x = func_name(12, 34)

Just by lining up the punctuation in the call with the same bits of the definition, you can see that arg1 will be 12, and arg2 will be 34. Nice.

OK, so now let’s look at how a class with base classes is defined:

class MyClass(BaseClass, AnotherBase):

To create an instance of this class, you use the name of the class, and parens, but now the parallelism is gone. You don’t pass a BaseClass to construct a MyClass:

my_obj = MyClass(...)

Just looking at the class line, you can’t tell what has to go in the parens to make a MyClass object. So “def” and “class” have very similar syntax, and function calls and object creation have very similar syntax, but the mimicry in function calls that can guide you to the right incantation will throw you off completely when creating objects.

This is the sort of thing that experts glide right past without slowing down. They are used to arcane syntax, and similar things having different meanings in subtly different contexts. And a lot of that is inescapable in programming languages: there are only so many symbols, and many many more concepts. There’s bound to be overlaps.

But we could do better. Why use parentheses that look like a function call to indicate base classes? Here’s a better syntax:

class MyClass from BaseClass, AnotherBase:

Not only does this avoid the misleading punctuation parallelism, but it even borrows from the English we use to talk about classes: MyClass derives from BaseClass and AnotherBase. And “from” is already a keyword in Python.

BTW, even experts occasionally make the mistake of typing “def” where they meant “class”, and the similar syntax means the code is valid. The error isn’t discovered until the traceback, which can be baffling.

I’m not seriously proposing to change Python. Not because this wouldn’t be better (it would), but because a change like this is impractical at this late date. I guess it could be added as an alternative syntax, but it would be hard to argue that having two syntaxes for classes would be better for beginners.

But I think it is helpful to try to see our familiar landscape as confused beginners do. It can only help with explaining it to them, and maybe help us make better choices in the future.

Tidelift

Monday 20 May 2019

I’m a firm believer that open source software is woefully under-supported. The value people get from using open source far far far exceeds the resources they collectively put into the open source ecosystem.

There have been attempts to improve this situation, but they usually are some form of internet tip jar that goes nowhere. Businesses won’t put money in tip jars because they don’t know what to contribute to (they have hundreds of open source dependencies), and as infuriating as it is, they wonder what they get for that money (they already got the software!!)

Tidelift is approaching the problem of sustainable open source differently: what help do enterprises need with open source? What services would they be willing to pay for? How can enterprises be connected with open source maintainers to benefit both?

Tidelift logo

They sell the Tidelift Subscription, a collection of tools, information, and assurances to close some of the gaps businesses typically face when using open source.

The people behind Tidelift have deep experience at the intersection of open source and enterprises, having come from Red Hat, Gnome, and Mozilla. They’ve thought a lot about the problem of open source sustainability from both sides, and know what they are doing.

Coverage.py is part of the Tidelift Subscription, which makes me “a Lifter.” I get a small but not insignificant amount of money each month as a result. I want Tidelift to succeed partly for myself, but more importantly, because it could mean that open source is more sustainable overall.

If you are an open source maintainer, take a look at whether you can make money from Tidelift. What they ask of you is pretty much what well-maintained projects already do (good release notes, accurate metadata, points of contact), and they can help with some things that are difficult, like security reporting and license compliance.

If your company uses open source, consider whether the subscription is something you would use. It could help your business, and it would definitely help open source.

Thanks.

Coverage.py 5.0a5: pytest contexts

Monday 13 May 2019

Development of version 5 of coverage.py is going slowly, but it is progressing. The latest alpha is out: coverage.py 5.0a5.

The biggest changes are due to Stephan Richter and Justas Sadzevičius, from Shoobox. They improved the support for recording dynamic contexts, informally known as Who Tests What.

Now third-party code, either as a coverage.py plugin or using the coverage.py API can set the dynamic context.

I’ve added support for this to the pytest-cov plugin, to record the pytest test id as the dynamic context. If you’d like to try it:

pip install coverage==5.0a5
pip install git+https://github.com/nedbat/pytest-cov.git@nedbat/contexts
pytest --cov=. --cov-context

The .coverage data file is now a SQLite database. Coverage.py has no support yet for using the collected context data, but you can examine the raw data in the database:

$ sqlite3 .coverage
SQLite version 3.19.3 2017-06-27 16:48:08
Enter ".help" for usage hints.

sqlite> select * from context;
id          context
----------  --------------------------------------------------
1
2           test_it.py::test_prod1|setup
3           test_it.py::test_prod1|call
4           test_it.py::test_prod1|teardown
5           test_it.py::test_prod2|setup
6           test_it.py::test_prod2|call
7           test_it.py::test_prod2|teardown
8           test_it.py::test_prod3[1-1]|setup
9           test_it.py::test_prod3[1-1]|call
10          test_it.py::test_prod3[1-1]|teardown
11          test_it.py::test_prod3[10-100]|setup
12          test_it.py::test_prod3[10-100]|call
13          test_it.py::test_prod3[10-100]|teardown
14          test_it.py::test_prod3[11-121]|setup
15          test_it.py::test_prod3[11-121]|call
16          test_it.py::test_prod3[11-121]|teardown

sqlite> select * from arc where context_id = 9;
file_id     context_id  fromno      tono
----------  ----------  ----------  ----------
1           9           -14         15
1           9           15          16
1           9           16          17
1           9           17          -14

sqlite> select * from file where id = 1;
id          path
----------  --------------------------------------------------
1           /Users/ned/lab/pytest_func_test/src/product.py

I’m looking for feedback about what kinds of reporting would be useful. Stephan has a pull request to provide some context-based reporting. Does it do what you want? Have you used contexts? What needs to happen before they are ready for everybody?

Startup.py

Tuesday 16 April 2019

Someone recently asked how to permanently change the prompt in the Python interactive REPL. The answer is you can point the PYTHONSTARTUP environment variable at a Python file, and that file will be executed every time you enter the interactive prompt.

I use this to import modules I often want to use, define helpers, and configure my command history.

In my .bashrc I have:

export PYTHONSTARTUP=~/.startup.py

Then my .startup.py file is:

# Ned's startup.py file, loaded into interactive python prompts.
# Has to work on both 2.x and 3.x

print("(.startup.py)")

import collections, datetime, itertools, math, os, pprint, re, sys, time
print("(imported collections, datetime, itertools, math, os, pprint, re, sys, time)")

pp = pprint.pprint

# A function for pasting code into the repl.
def paste():
    import textwrap
    exec(textwrap.dedent(sys.stdin.read()), globals())

# Readline and history support
def hook_up_history():
    try:
        # Not sure why this module is missing in some places, but deal with it.
        import readline
    except ImportError:
        print("No readline, use ^H")
    else:
        import atexit
        import os
        import rlcompleter

        history_path = os.path.expanduser(
            "~/.pyhistory{0}".format(sys.version_info[0])
        )

        def save_history(history_path=history_path):
            import readline
            readline.write_history_file(history_path)

        if os.path.exists(history_path):
            readline.read_history_file(history_path)

        atexit.register(save_history)

# Don't do history stuff if we are IPython, it has its own thing.
is_ipython = 'In' in globals()
if not is_ipython:
    hook_up_history()

# Get rid of globals we don't want.
del is_ipython, hook_up_history

A few things could us an explanation. The paste() function lets me paste code into the REPL that has blank lines in it, or is indented. Basically, I can copy code from somewhere, and use paste() to paste it into the prompt without having to fix those things first. Run paste(), then paste the code, then type an EOF indicator (Ctrl-D or Ctrl-Z, depending on your OS). The pasted code will be run as if it had been entered correctly.

The history stuff gives me history that persists across Python invocations, and keeps the Python 2 history separate from the Python 3 history. “pp” is very handy to have as a short alias.

Of course, you can put anything you want in your own .startup.py file. It’s only run for interactive sessions, not when you are running programs, so you don’t have to worry that you will corrupt important programs.

Older:

Apr 2:

Cog 3.0

Mar 2:

Mutmut