Single Regular Expression to identify words with alphabetised characters

Steven Elling ellings at kcnet.com
Tue Nov 5 21:26:57 CST 2002


On Tuesday 05 November 2002 00:18, Ed Allen wrote:

>     If you reread the subject line slowly and think about why I chose
>     those particular words you might begin to understand why 'sort'
>     which would alphabetise the words differs from the 'grep' which
>     only prints lines whose characters are in alphabetic order.

I was wondering why the subject was worded that way.  I was having an off 
day and thought you meant to say 'alphabetized words'.

>     Linux comes with a words list containing some 45,000 entries.
>     Suppose you show us how to use 'sort' to find all the ones with
>     the vowels in acsending order ?

It isn't possible to find all words with vowels in acsending order with 
sort.

>     Or you can show a "better" version of this...
>
>         grep
>        
> '^[^aeiouy]*[aeiouy]*[^aeiouy]*[aeiouy]*[^aeiouy]*[aeiouy]*[^aeiuoy]*[aei
>ouy]*
> [^aeiouy]*[aeiouy]*[^aeiouy]*[aeiouy]*[^aeiouy]*[aeiouy]*[^aeiuoy]*[aeiou
>y]*[^aeiouy]*$'
>
>             (Or explain how it is intended to work)
>
>         Had to wrap that so it did not mess up your screen.

Match from the begining of the line zero or more occurrences of non-vowel 
characters --- I would assume the y was due to fat-finger --- followed by 
zero or more occurrences of vowel characters followed by... rinse and 
repeat 7 times... followed by zero or more occurrences of non-vowel 
characters to the end of the line.

Notice I didn't say consonants instead of non-vowel characters.  The reason 
being because '[^aeiou]' will match whitespace, punctuation, etc. as well.  
Also this will only match lower case characters.

However, the regular expression will also match empty lines, lines with 
whitespace only, punctuation only, numbers only, words with no vowels, and 
words with vowels in any order (i.e. cry, cyst, dry, fly, fry, nymph, 
syzygy, acquiesce, ambiguous).

A better version might be as follows:

grep '^[^aeiou]*[aeiou]+[^aeiou]*[aeiou]*
[^aeiou]*[aeiou]*[^aeiuo]*[aeiou]*
[^aeiou]*[aeiou]*[^aeiou]*[aeiou]*
[^aeiou]*[aeiou]*[^aeiuo]*[aeiou]*[^aeiou]*$'

With one slight change, this regular expression takes care of most of the 
problems stated above. However, this will also match words with a single 
vowel (i.e. ably, abyss, cozy, zonk) and for that matter it will still 
match vowels in any order.

There is one other problem. The pattern will not match words that contain 
vowels with grave, acute, circumflex, tilde, diaeresis, ring above, and 
stroke accents only --- if they can actually be classified as vowels.

I regress.

>     Regular Expressions are meant for selecting/matching patterns in
>     text.  That is one of the powerful parts which has kept Unix systems
>     still being actively developed after thirty years.

Your preaching to the choir.




More information about the Kclug mailing list