Evolutionary ladder

Sunday 30 March 2008

I’ve noticed a useful pattern in some Python polymorphic functions. The bulk of the function is written assuming the input is of one preferred type. But other types are also accepted by first converting them (evolving them) up to the preferred type.

A common example is a function that operates on an open file, but will also accept a file name:

def doit(f):
    # If f is a string, it's a filename to be opened
    if isinstance(f, basestring):
        f = open(f)
    # .. do stuff with the file ..

A richer example comes from Jared Kuolt’s StaticGenerator. Here the goal is to build a list of URLs, but the function accepts strings as well as a variety of Django ORM objects:

extracted = []
for resource in resources:
    # A URL string
    if isinstance(resource, (str, unicode, Promise)):
        extracted.append(str(resource))
        continue
    
    # A model instance; requires get_absolute_url method
    if isinstance(resource, Model):
        extracted.append(resource.get_absolute_url())
        continue
    
    # If it's a Model, we get the base Manager
    if isinstance(resource, ModelBase):
        resource = resource._default_manager
    
    # If it's a Manager, we get the QuerySet
    if isinstance(resource, Manager):
        resource = resource.all()
    
    # Append all paths from obj.get_absolute_url() to list
    if isinstance(resource, QuerySet):
        extracted += [obj.get_absolute_url() for obj in resource]

In this case, there are two cases that branch off to get handled completely, and three different types which evolve up through each other.

There’s something very satisfying about this style of code. The polymorphism comes not from switching on the type and doing different actions for different types, but from evolving types one to the next until you have the type needed to operate on.

Statically typed languages can use a similar design, where overloaded functions adapt their arguments and call each other. The dynamic language pattern is more flexible in that different kinds of adaptation can be used, not just type-based. Here’s a hypothetical example:

def doit_with_html(h):
    if h.startswith('http:'):
        # Turn a URL into its HTML data.
        h = urllib2.urlopen(h).read()
    #.. do something with the HTML ..

Here we aren’t distinguishing between two different types, but between two different structures of string. If the string smells like a URL, we use it to fetch HTML to operate on, otherwise we assume the string is HTML to begin with.

This style of coding is a great way to make functions more flexible and useful to callers. It’s not only concise, but it gives you a natural way to extend the function to operate on more types. Often a new kind of input can be accomodated simply by adding an adaptation clause to the top of the existing function, rather than proliferating similar functions with similar (but different) names.

Comments

[gravatar]
James Thiele 12:32 PM on 30 Mar 2008

I like polymorphism in Python and use it but I think should always be inside a try:.

[gravatar]
Kevin Reid 7:07 PM on 30 Mar 2008

In Common Lisp, this is known as a designator. For example, characters are string designators (for single-character strings), and symbols are function designators (for the function bound to that symbol).

[gravatar]
Ned Batchelder 7:24 PM on 30 Mar 2008

@James: it seems to me that there are at least three styles of polymorphism: 1) using isinstance as shown here to determine what type the object is, 2) using hasattr to determine if the object has needed methods, or 3) using the methods directly, and catching the exceptions that will occur if the method is not present or somehow unsuitable.

I think you are saying you prefer style 3, is that right?

In this case, I prefer the explicit check for the types you are looking for.

[gravatar]
Ned Batchelder 7:25 PM on 30 Mar 2008

@Kevin: I'm a bit baffled by the Lisp terminology here, and the page you linked isn't helping me. Can you explain?

[gravatar]
Kevin Reid 10:02 PM on 30 Mar 2008

In your first example (doit), if the behavior it has recurs elsewhere in the API, we would speak of file designators, which are either files or basestrings.

The general concept is that you have some fundamental type — file — and a broader type — file or basestring — whose members all denote some file (are normalized into some file) under the "file designator" interpretation.

Another Python example would be the dict constructor, which could be viewed as accepting a dict designator:
>>> dict([(1,2),(3,4)])
{1: 2, 3: 4}
>>> dict({1:2, 3:4})
{1: 2, 3: 4}

(The page I linked is mostly setting out the rules for how "X is a y designator" in the specification is to be interpreted.)

[gravatar]
Kevin Reid 10:06 PM on 30 Mar 2008

P.S. The purpose of making designators part of a library design, I would say, is to let the user avoid having to write in conversions or lookups when they are obvious from context.

[gravatar]
nes 11:19 AM on 31 Mar 2008

I don't know that explicit if statements are better than
http://en.wikipedia.org/wiki/Generic_function
but generic functions requires some additional implementation in infrastructure
http://www.python.org/dev/peps/pep-3124/

The later example is a
http://en.wikipedia.org/wiki/Type_system#Dependent_types
and I think it is currently a hot topic in type research.

[gravatar]
SuperJared 8:43 PM on 31 Mar 2008

Ned, thanks for the nice critique. Not sure why this didn't show up on my radar till now, but it's very satisfying indeed.

I felt it important to accept as many types of arguments as possible, and as easily as possible. The evolution of the object will basically always filter into the same type of object which we can then use in one way, which is the point of course.

[gravatar]
Sam Penrose 1:18 AM on 3 Apr 2008

1) I've been using this pattern for a couple years now; I still like it, but I've found you have to watch for YAGNI. Even though it may be perfectly logical for API consumers to pass a diverse array of types in, in practice one (or some) of the types may dominate, leading the branches for the others to become dead code.

2) It occurs to me that this pattern might be called "semantic typing", as follows: the REAL type in your example is "file"; whether the user passes in a Python type of string or file handle is exactly the sort of detail that libraries should manage without troubling programmers. That of course suggests that the pattern is useful only insofar as a given "semantic type" maps cleanly to 2 or more language types ...

[gravatar]
Ned Batchelder 5:58 AM on 3 Apr 2008

@Sam: those are good points. Except for your last sentence in #2, because as I mention at the end of the post, there's no need for the polymorphism to be across types: the "semantic types" could be distinguished by any test you can express, not just one based on types.

[gravatar]
Robert Brewer 4:26 PM on 3 Apr 2008

Well done. Bytecode manipulation is another arena rife with this, since you can obtain it from functions, code objects, a string of chr(opcode), or a list of opcodes. See http://www.aminus.net/geniusql/browser/trunk/geniusql/codewalk.py#L137 for an example.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.