Do you write readable regexes?

 Why not?

Imagine you have a regex to validate a password


Thats not too complicated, but the readability could be better. The solution here is the option x or IgnorePatternWhitespace.

Most regular expression flavours allow you to use the option x, this is an important option everybody should know, who want to write longer patterns. The option x is doing two things:

  1. Allows to use whitespace to make the pattern more readable
  2. Allows the usage of comments inside the pattern

The whitespace used then inside the pattern does not belong to the pattern itself, it is only to improve the readability.  But that means also that if you want to match e.g. a space you have to escape it using a or use the whitespace class \s.

Example in C#

Regex password = new Regex(@"
        ^ # Match the start of the string
        (?=.*\p{Lu}) # Positive lookahead assertion, is true when there is an uppercase letter
        (?=.*\P{L}) # Positive lookahead assertion, is true when there is a non-letter
        \S{8,} # At least 8 non whitespace characters
        $ # Match the end of the string
    ", RegexOptions.IgnorePatternWhitespace);

Looks a lot better, what do you think?

That way its much clearer what this regex is doing and together with meaningful comments, even a regex novice will see quite quick what this part of the code is doing.

This useful feature is available in the most important languages like .NET, Java, Perl, PCRE, Python, Ruby. it is not supported by e.g. ECMA (JavaScript) and POSIX BRE/POSIX ERE (the PHP ereg functions are using  POSIX ERE, the preg functions are using PCRE).

So, in future, hopefully everyone is going to write readable regexes. You have now seen that it is not a feature of regular expressions to be unreadable, it is as always  the programmer how writes unreadable code.

For more details about this option you can have a look at