I’m amazed at how many regular expression libraries there are, and at how each invents new syntax for some new feature. The Oniguruma library, for example, describes character class operators:
^... negative class (lowest precedence operator)
x-y range from x to y
[...] set (character class in character class)
..&&.. intersection (low precedence at the next of ^)
ex. [a-w&&[^c-g]z] ==> ([a-w] AND ([^c-g] OR z)) ==> [abh-w]
as well as greedy, reluctant, and possessive qualifiers? Yikes.
Comments
> x-y range from x to y
These two are straight from POSIX (i.e. standard, lowest-common-denominator) regular expressions.
> [...] set (character class in character class)
> ..&&.. intersection (low precedence at the next of ^)
These just seem confusing. Who the heck needs such convoluted character classes? If it's that complex, spell out the list of characters in the class.
> ex. [a-w&&[^c-g]z] ==> ([a-w] AND ([^c-g] OR z)) ==> [abh-w]
It's quite revealing that the best example they can give of "class-in-a-class" and "intersection" is far more confusing to read than the resulting class. If someone chose to write '[a-w&&[^c-g]z]' where they could write '[abh-w]', I'd keep them away from any programs I care about.
Add a comment: