.netCoders Contact Us
Search:

Groups

Groups are user-defined subsets of the Regular Expression pattern, and are used when processing a match to identify subsets of the matching string. You can think of Groups as Sub-matches.

While extracting U.S. phone numbers with a regular expression that matches the whole phone number, you may want to identify the three-digit area code. Groups allow you to extract that subset from a match. Later on, we'll look at an implementation for doing just that.

Syntax of Grouping Constructs
The syntax for indicating a group within a regular expression is to enclose the subpattern within parentheses: (). When indicating a group, you can specify whether the group should be retrievable through the Groups property, and the name of the group. As a result, there are three variations on the Group syntax to handle these cases.

Normal Capturing Groups - ()
This syntax tells the regular expression engine to capture the group so that it can be retrieved from the Groups property of the Match by a numeric index. Note that if the ExplicitCapture RegexOption is set, then only named groups will be captured, and unnamed capturing groups will not be a part of the final Match.

Consider the following string of characters:

K9 DG OK D1
If we use this regular expression, which matches an uppercase letter followed by a number (notice the letter in parentheses)...
([A-Z])\d
...it will return the following matches:
Match 1: K9
Match 2: D1
Each Match also has a group, because we used a capturing group around the uppercase letter:
Match 1 Group: K
Match 2 Group: D
We'll see later how to programmatically access the group via the Groups property of the Match object.

Named Capturing Groups - (?<name>)
Named capturing groups are an extension of normal capturing groups and allow us to specify a name for the group. This makes the regular expression more understandable, and the Group can later be extracted from the Match by name as well, making for more readable and less fragile code.

Extending our earlier example, we could explicity name the group "letter" using the following regular expression:

(?<letter>[A-Z])\d
This will result in the same matches being found, and the same groups, except now we can access the group by name and not just by numeric index. The following snippet comes from code that accesses the groups by name.

objGroupsCollection["letter"].Value;
C# VB

We'll see how to use the Group object shortly, after we looking at the non-capturing group.

Non-Capturing Groups - (?:)
Non-capturing groups are used to instruct the regex parser to treat the subpattern as a group, but not to capture the results as a Group.

Using our earlier example, we can make our capturing group a non-capturing group by adding a question mark(?) and a colon (:).

(?:[A-Z])\d
This will find the same Matches, however no groups will be captured. You might be asking, what is the purpose of grouping a subexpression if it's not going to be used? Where non-capturing groups become useful is when you are using the OR (|) construct within the regular expression. Look at the following pattern, where we want to limit our matches to those that begin with A, B, or C, and are following by a number:
A|B|C\d
What matches do you expect when the above pattern is run against the following input string?
K9 CC C3 A1
The matches returned are:
Match 1: C3
Match 2: A
Why did the match return A, and not A1? The answer is in the order of operations. Regular expressions are processed left to right, so the expression gets translated into (parentheses used for clarification):

Match an (A) OR a (B) OR a (C followed by a letter)

Therefore, in order to properly group the letters, we need to use a grouping construct. If we didn't need the group in the output, then we could use a non-capturing group and save some processing. The following corrected expression will return the two expected matches, C3 and A1.

(?:A|B|C)\d
The GroupCollection and Group objects
Now that you have learned about the grouping syntax, I'm sure you are eager to see how to programmatically retrieve groups. Each Match object has a Groups property. This returns a GroupCollection containing a series of Group objects.

Regex r = new Regex(@"([A-Z])\d)");
Match m = r.Match("K9 DG OK D1);
GroupCollection gc = m.Groups;
C# VB

Because the GroupCollection, like the MatchCollection, implements ICollection and IEnumerable, as well as an Indexer, you can access the Group objects using the foreach syntax, or using the array syntax. To output the first match, you could simply write:

Console.WriteLine("Match 1:" + gc[0].Value);
C# VB

As mentioned, the foreach syntax can be used to iterate through all the Group objects in the GroupCollection:

foreach(Group g in gc)
{
    Console.WriteLine("Match:" + g.Value);
}
C# VB

One behavior of groups that you need to watch out for is that the first Group in a GroupCollection will be the entire Match string itself.

Example
Having covered the purpose and syntax of groups, let's look at a practical example. In our previous sections, we have used the following pattern to identify phone numbers in a string:
\d\d\d-\d\d\d-\d\d\d\d
Now, what if we wanted to extract the area code from the match? We would parse the entire match for the first 3 characters, but parsing wouldn't be a flexible solution, and would definitely be cumbersome for more complicated regular expressions. Instead, we can use the grouping construct, a pair of parentheses (), around the portion we are interested in. Our new regular expression becomes:
(\d\d\d)-\d\d\d-\d\d\d\d

static void Main(string[] args)
{
     //Look For Match
     Match m1 = Regex.Match("Our phone number is 508-888-8888.", @"(\d\d\d)-\d\d\d-\d\d\d\d");
     if (m1.Success)
     {
         //Output details of Match
         Console.WriteLine("The value '{0}' was found at index {1}, and is {2} characters long.", m1.Value, m1.Index, m1.Length);

         //Output details of Groups
         foreach(Group g in m1.Groups)
         {
             Console.WriteLine("A Group, '{0}', was found at index {1}, and is {2} characters long.", g.Value, g.Index, g.Length);
         }
     }
}
C# VB


The results, which also show that the first group is the match itself: