I'm amazed at how many regular expression libraries there are, and at how each invents new syntax for some new feature. The Oniguruma library, for example, describes character class operators:

^...    negative class (lowest precedence operator)
x-y     range from x to y
[...]   set (character class in character class)
..&&..  intersection (low precedence at the next of ^)
        
  ex. [a-w&&[^c-g]z] ==> ([a-w] AND ([^c-g] OR z)) ==> [abh-w]

as well as greedy, reluctant, and possessive qualifiers? Yikes.

» 2 reactions

Comments

[gravatar]
Ben Finney 9:47 AM on 10 Aug 2007

> ^... negative class (lowest precedence operator)
> x-y range from x to y

These two are straight from POSIX (i.e. standard, lowest-common-denominator) regular expressions.

> [...] set (character class in character class)
> ..&&.. intersection (low precedence at the next of ^)

These just seem confusing. Who the heck needs such convoluted character classes? If it's that complex, spell out the list of characters in the class.

> ex. [a-w&&[^c-g]z] ==> ([a-w] AND ([^c-g] OR z)) ==> [abh-w]

It's quite revealing that the best example they can give of "class-in-a-class" and "intersection" is far more confusing to read than the resulting class. If someone chose to write '[a-w&&[^c-g]z]' where they could write '[abh-w]', I'd keep them away from any programs I care about.

[gravatar]
Ned Batchelder 3:28 PM on 10 Aug 2007

I couldn't agree more about the example of the character class set operators! It took me a while to figure out what was happening to the z in that example...

Add a comment:

name
email
Ignore this:
not displayed and no spam.
Leave this empty:
www
not searched.
 
Name and either email or www are required.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.