.netCoders Contact Us
Search:

What are Regular Expressions?

Regular Expressions is a term used to refer to a pattern-matching technology for processing text that has existed in the UNIX world for years and has now been incorporated into the .NET Base Class Library. A Regular Expression itself is an string that represents a pattern, encoded using the regular expression language and syntax. Using this regular expression, you can parse html, log files, documents, or any other string sources, looking for substrings matching the pattern, and perform extraction and editing functions.

Although there is no standards body governing the regular expression language, Perl 5, by virtue of it's popularity, has set the standard for regular expression syntax. The .NET Framework Regular Expressions library is designed to be compatible with Perl 5 regular expressions, as well as including additional features not found elsewhere.

Example

To give you a taste of how regular expressions work, let us look at an example. I once had a professor who proclaimed that history was summed up by the -ism's [a term referring to words ending in i-s-m. Example, existentialism.] Suppose you were given a document, and asked to extract the ism's mentioned inside of it. Our sample contains the following passage:

Buddhism, Confucianism, and Taoism form the basis of Chinese philosophy, and are as central to the culture as Individualism is to the United States.

How would you extract the isms? You could manually proceed line by line, without the help of a computer, and record the isms you see. That would work, but it would take a while, and after all, you are a programmer. You could also write a string parsing utility to separate the document into words, and then see which words ended in ism. That too would work, but would require more effort. With regular expressions, you can extract the isms with this pattern:

\w*ism

This pattern says to look for a series of zero or more characters (\w*) ending in ism. Running this pattern against the above passage results in the following matches extracted for you:

Buddhism
Confucianism
Taoism
Individualism

Regular expressions can perform more complex searches, and in this guide to Regular Expressions, we'll cover the syntax of regular expressions, the classes in the System.Text.RegularExpressions, and how to use them. Using our online RegEx Tester, you can test regular expressions online. We'll finish off our discussion with some examples, such as how to extract information from web pages.