What’s the point of os.path.commonprefix?

Monday 22 March 2010

Most of the Python standard library is great, providing functions and classes that do their jobs well, often even before you knew you needed the job done (urlsafe_b64encode FTW!)

Which makes my disappointment with os.path.commonprefix all the stronger. This function is worse than useless, it’s misleading. Although it’s in the os.path module, it knows nothing about paths, working instead character-by-character:

>>> os.path.commonprefix(['/home/ned/cog', '/home/ned/coverage'])
'/home/ned/co'      # That's not an actual path!

The docs helpfully include the warning:

Note that this may return invalid paths because it works a character at a time.

But it should say:

This function is in the wrong place, and has nothing to do with paths, don’t use it if you are interested in file paths!

I accepted a patch to coverage.py which used this function, and it looked good. But eventually I turned up cases it got wrong, and had to re-discover what people seem to have understood this for at least eight years. *Sigh*

Comments

[gravatar]
Guyon Moree 9:07 PM on 22 Mar 2010

ugly one-liner to get the job done (if I understood correctly)

>>> p1 = "/home/ned/cog"
>>> p2 = "/home/ned/coverage"
>>> "/".join([p for i, p in enumerate(p1.split("/")) if p == p2.split("/")[i]])

'/home/ned'

[gravatar]
Guyon Moree 9:09 PM on 22 Mar 2010

oh nevermind, that was silly and just happens to work in this case :)

[gravatar]
Lucas 12:12 AM on 23 Mar 2010

I notice that it works on lists as well as strings. It's almost like they wrote an auxiliary function to what they actually needed and stopped there.

[gravatar]
Ian Bicking 12:15 AM on 23 Mar 2010

The Apache Alias command works the same way, I loathe it as well ;)

[gravatar]
Alec Munro 8:46 AM on 23 Mar 2010

Oddly enough, I actually needed exactly this functionality two years, and decided to look in os.path just in case. So while it may be annoying and broken to other people, for me it was "Batteries included" at just the right time.

[gravatar]
Paddy3118 8:54 AM on 23 Mar 2010

The following works for your example, but testing is limited, and any true replacement should not have to rely on the current broken implementation:

>>> import os
>>> def commonprefix(*args):
	return os.path.commonprefix(*args).rpartition(os.path.sep)[0]

>>> commonprefix(['/home/ned/cog', '/home/ned/coverage'])
'/home/ned'

[gravatar]
Ned Batchelder 9:12 AM on 23 Mar 2010

@Paddy3118: thanks for the implementation. As it happens, the extra annoyance (not stdlib's fault) was that the patch shouldn't have even been trying to find a common prefix!

[gravatar]
uolot 1:12 PM on 23 Mar 2010

Wouldn't

os.path.dirname(os.path.commonprefix(
    ['/home/ned/cog', '/home/ned/coverage']))
be better?

[gravatar]
Paddy3118 1:52 PM on 23 Mar 2010

Here is a version that does not rely on the faulty version. It compares the directories by whole directory names, level-by-level.

>>> from os.path import sep
>>> from itertools import takewhile
>>> def allnamesequal(name):
	return all(n==name[0] for n in name[1:])

>>> def commonpaths(paths):
	bydirectorylevels = zip(*[p.split(sep) for p in paths])
	return sep.join(list(zip(*takewhile(allnamesequal, bydirectorylevels)))[0])

>>> paths = ['/home/ned/cog', '/home/ned/coverage']
>>> commonpaths(paths)
'/home/ned'

[gravatar]
Paddy3118 2:35 PM on 23 Mar 2010

It seemed right for a Rosetta code task: http://rosettacode.org/wiki/Find_Common_Directory_Path

[gravatar]
Brett 9:34 PM on 24 Mar 2010

Yes, commonprefix is not really an os.path thing as it's simply longest prefix. Something like commonpath() or commonroot() would be better (and probably would be accepted into the stdlib).

[gravatar]
Roger Lipscombe 1:16 PM on 29 Mar 2010

I've just found exactly the same problem with a GetRelativePath method that I wrote in C# a while back.

It does exactly what I needed at the time, but breaks horribly in almost every other case. I keep meaning to go back and write it properly, but...

Of course, I've not released it as part of a standard library.

[gravatar]
Petr Viktorin 8:11 PM on 30 Nov 2012

Please, if you ever try to implement a "correct" version, take os.altsep into account -- at least by calling normpath().

[gravatar]
Gerard 1:50 PM on 14 Aug 2017

Fixed in Python 3.5 with addition of os.path.commonpath

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.