Friday 10 August 2007 — This is more than 17 years old. Be careful.
I’m amazed at how many regular expression libraries there are, and at how each invents new syntax for some new feature. The Oniguruma library, for example, describes character class operators:
^... negative class (lowest precedence operator)
x-y range from x to y
[...] set (character class in character class)
..&&.. intersection (low precedence at the next of ^)
ex. [a-w&&[^c-g]z] ==> ([a-w] AND ([^c-g] OR z)) ==> [abh-w]
as well as greedy, reluctant, and possessive qualifiers? Yikes.
Comments
> x-y range from x to y
These two are straight from POSIX (i.e. standard, lowest-common-denominator) regular expressions.
> [...] set (character class in character class)
> ..&&.. intersection (low precedence at the next of ^)
These just seem confusing. Who the heck needs such convoluted character classes? If it's that complex, spell out the list of characters in the class.
> ex. [a-w&&[^c-g]z] ==> ([a-w] AND ([^c-g] OR z)) ==> [abh-w]
It's quite revealing that the best example they can give of "class-in-a-class" and "intersection" is far more confusing to read than the resulting class. If someone chose to write '[a-w&&[^c-g]z]' where they could write '[abh-w]', I'd keep them away from any programs I care about.
Add a comment: