Do you write readable regexes?

 Why not?

Imagine you have a regex to validate a password

^(?=.*\p{Lu})(?=.*\P{L})\S{8,}$ 

Thats not too complicated, but the readability could be better. The solution here is the option x or IgnorePatternWhitespace.

Most regular expression flavours allow you to use the option x, this is an important option everybody should know, who want to write longer patterns. The option x is doing two things:

  1. Allows to use whitespace to make the pattern more readable
  2. Allows the usage of comments inside the pattern

The whitespace used then inside the pattern does not belong to the pattern itself, it is only to improve the readability.  But that means also that if you want to match e.g. a space you have to escape it using a or use the whitespace class \s.

Example in C#

Regex password = new Regex(@"
        ^ # Match the start of the string
        (?=.*\p{Lu}) # Positive lookahead assertion, is true when there is an uppercase letter
        (?=.*\P{L}) # Positive lookahead assertion, is true when there is a non-letter
        \S{8,} # At least 8 non whitespace characters
        $ # Match the end of the string
    ", RegexOptions.IgnorePatternWhitespace);

Looks a lot better, what do you think?

That way its much clearer what this regex is doing and together with meaningful comments, even a regex novice will see quite quick what this part of the code is doing.

This useful feature is available in the most important languages like .NET, Java, Perl, PCRE, Python, Ruby. it is not supported by e.g. ECMA (JavaScript) and POSIX BRE/POSIX ERE (the PHP ereg functions are using  POSIX ERE, the preg functions are using PCRE).

So, in future, hopefully everyone is going to write readable regexes. You have now seen that it is not a feature of regular expressions to be unreadable, it is as always  the programmer how writes unreadable code.

For more details about this option you can have a look at  regular-expressions.info

How to test your regular expression

Of course, the ultimate tool is RegexBuddy by Jan Goyvaerts. I personally don’t have it at the moment, but I surely need to get it soon. It supports various languages, gives you detailed explanation on each part, you can step through the matching process …

But if you use regex not often and do only small ones, you should be fine with free online regular expression testers. And it is important that you test your regex, extract use cases from your real data and add also test data that you don’t want to match!

Online regex testers are a great help while developing a regular expression. As long as you do not want to use features they don’t support, they visualize your match and show you the content of the capturing groups instantly.

Critical features, you should test in your real language, are mainly lookbehind assertions and Unicode  features like Unicode properties.

My favourites are

Regexr is based on ActionScript 3, that means it implements regex after the ECMA-262 standard.  See regular-expressions.info for more info. (I am not sure if the standard has changed or Regexr has changed, because Regexr supports simple look behind assertions, but ECMA-262 does not).

Matches are highlighted and the content of the capturing groups is shown when the mouse hovers over the match, it also allows to test replacements and allows to create permalinks to your tested regex (to share it easily on SO ;))

Rubular is based on Ruby.   See regular-expressions.info for more info. Matches are highlighted and the matches and the content of the according capturing groups are shown in a list. The regex is processed on a server, therefore the result is not shown instantly but quite quick the most of the time. It allows also to create permalinks to your tested regex

Other online testers are (I use them not often, but try them to find your personal favorite)

And at last, this is not tool for testing, you can give it a regex and it tells you what this is doing: