hyphenate
Created 9 July 2007, last updated 12 July 2007
Hyphenate.py implements Frank Liang’s hyphenation algorithm (the one used in TeX) in Python.
This module provides a single function to hyphenate words. hyphenate_word takes a string (the word), and returns a list of parts that can be separated by hyphens:
>>> hyphenate_word("hyphenation")
['hy', 'phen', 'ation']
>>> hyphenate_word("supercalifragilisticexpialidocious")
['su', 'per', 'cal', 'ifrag', 'ilis', 'tic', 'ex', 'pi', 'ali', 'do', 'cious']
>>> hyphenate_word("project")
['project']
This Python code is in the public domain.
The module as provided only hyphenates English words, but if you can find TeX hyphenation patterns for another language (and can deal with the character encoding issues you’ll encounter in them), the same algorithm will work for other languages.
The Liang algorithm does not provide all possible hyphenation points. It merely tries to provide some of them, without providing any wrong ones. So the set of breaks from hyphenate.py will be a subset of the full set of break points.
Download: hyphenate.py
See also
- Hyphenation algorithm at Wikipedia, with links to other implementations of this same algorithm.
- My blog, where other topics of interest to Python hyphenators are likely to appear.
Comments
http://code.djangoproject.com/ticket/4821
Project is indeed ['proj', 'ect'] as a noun and ['pro', 'ject'] as a verb.
Add a comment: