A beginner's guide to regular expressions

Filed under: regular expressions, technology, web development

comments (5) Views: 18,336

I've noticed over the past few months quite a few developers with little to no knowledge of regular expressions (regex from here on out). For whatever reason they haven't taken the time, or had the chance, to learn what I consider to be one of the most powerful, and useful tools available in programming. Even knowing a few basics can really streamline your workflow, and improve your code. Not only are they useful IN code, but they can even help you write code. In this post I'm going to cover some regex basics, then show you some real examples of how they can solve problems for you.

The basics are always a great place to start, so let's look at some syntax. At it's heart, regex are simply a way to match (and replace) one string with another. So we replace the literal string cat, with mouse, or the number 8675309, with 42. Now that would be useful if we wanted to replace many occurrences of 8675309, but that's not all that common. Wouldn't it be nice if, in addition to replacing 8675309 with 42, we could also replace 5318008 with 42? Regular Expressions let's you do that with special strings called meta characters. Meta characters are what gives regex their incredible power. Here's a list of some of the most common meta characters and how they're used.

  • Meta Characters
    • ^ beginning of a string
      • Except when used inside [] as part of a character set
    • $ end of a string
    • . matches any single character
      • . matches a or A, & or @, 0 or 4
    • . matches any single character
      • . matches a or A, & or @, 0 or 4
    • ? matches 0 or 1 times
      • https? matches both http and https
    • + matches 1 or more times
      • matches bo and boo
    • * matches everything (generally to the end of the line)
    • \ used to escape meta characters, to use as literal strings

You can probably start to tell that meta characters are powerful, but it's not that common to want to match just one character. Characters sets allow developers to create groupings of characters to be used in a match. Character classes are pre-existing strings which offer the same functionality. To create a character set, simply wrap any number of characters in square brackets []. Let's take a look at characters sets, and character classes.

  • Character Sets
    • [abc] Matches only a,b,c
    • [a-zA-Z0-9] Matches all alphanumeric, from char 1 to char 2
    • [A-Z] Matches all uppercase letters
    • [a-z] Matches all lowercase letters
    • [0-9] Matches all numbers
    • [x-z] Matches x through z (x, y, and z)
    • [^abc] Matches everything but a,b,c ( a ^ inside the brackets means "everything but these characters")
  • Character Classes
    • \d matches numbers, same as [0-9]
    • \w matches "word" characters, same as [A-Za-z0-9_]
    • \W matches "non-word" characters, essentially everthing BUT the previous set
    • \t matches tab
    • \n matches line break
    • \s matches whitespace characters, generally same as [\t\n ]

It's nice to have the ?, +, and * meta characters available to us, but wouldn't it be nice if we could specifiy a specific number of characters to match? Character ranges allow you to do this by specifying optional start and end numeric values within curly braces { }.

  • Character Ranges
    • {x} Matches exactly x
    • {x,y} Matches at least x, not more than y ( i.e. {3,8})
    • {x,} Matches at least x
    • {,y} Matches no more than y

Finally, Regular Expressions allow you to store one, or more, of your matches into temporary variables, called back references, which can used at other points in your expression. Here's how they work.

  • Back References
    • (.*) Matches entire string, stores temp variable
      • Could be used to wrap a string in bold tag, <b>\1</b>
    • (.+)\.php Matches filename, replaces PHP extension with CFM extension.
      • \1.cfm

Whew...fingers are tired. Now that we have a reference for the important aspects of regex, let's look at how we'd use them in real world examples.

  • Common regular expressions
    • ^[\w_]{3,16}$ Matches common usernames
    • <myTag[^>]*>(.*?)</myTag> Matches an HTML/xHTML tag
      • By default .* would match everything from that point, to the end of the string. You probably would not want that. By using the ? after the *, it converts the expression into a "lazy match" which gives you "just enough".
    • [a-f0-9]{6} Matches any hex color
    • ([\w]+[-._+&])*[\w]+@([-\w]+[.])+[a-zA-Z]{2,6} Matches most email addresses
      • Note that this is absolutely not perfect, but should catch most common emails
    • (([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? Matches most URLs
      • When matching URLs with the range of characters found in links these days, it's best to find something "off the shelf" as it were. There's just too many possibilities to try to catch everything. This should get you most of the way there however.
    • [ \t]+$ Removes spaces and tabs from the end of a line

I hope this will help you get started with regular expressions. Feel free to post your questions in the comments.

Amazon logo

If this article was interesting, or helpful, or even wrong, please consider leaving a comment, or buying something from my wishlist. It's appreciated!

comments powered by Disqus
coach outlet online jordan 13 grey toe beats by dre cyber monday michael kors black friday beats by dre cyber monday jordan 6 black infrared north face cyber monday michael kors cyber monday north face black friday coach outlet black infrared 23 13s north face cyber monday jordan 6 black infrared north face black friday coach cyber monday jordan 11 legend blue north face cyber monday black infrared 6s lebron 12 north face black friday jordan 11 legend blue louis vuitton outlet jordan 13 grey toe grey toe 13s beats by dre black friday coach black friday jordan 13 grey toe coach cyber monday uggs black friday jordan 13 black infrared 23 uggs cyber monday barons 13s uggs black friday beats by dre cyber monday black infrared 6s jordan 13 bred jordan 13 black infrared 23 north face black friday black infrared 6s jordan 11 legend blue michael kors black friday jordan 13 grey toe coach black friday michael kors black friday michael kors cyber monday beats by dre cyber Monday north face cyber monday coach black friday michael kors cyber monday beats by dre cyber Monday north face black friday beats by dre black friday lululemon black friday uggs black friday jordan 13 bred coach cyber monday beats by dre black friday uggs black friday coach black friday black infrared 6s