Tuesday, 12 June 2012

Simple Regex #5: Named Groups


Almost every Regex question landing on my desk has the potential to get its own blog post. This month's candidate is almost straightforward enough to be painless. Almost, because well, there's always some pain with Regex! Colleague 7 wanted to know how to extract coordinate data tidily from a string...
Regex question for you if you don't mind:
Looking to pick the x and y coordinates out of that string and return a point. I can obviously get the bits separately. But that would be too easy!
Should I use named capture groups?
Well, personally I would, because the outcome is (1) marginally more readable than indexed groups, in my opinion; and (2) more robust, when subsequently you have to extend the pattern to incorporate further groups. Relabeling index-based groups is a nightmare!

Notice that the above suggested digit filters are using zero-width positive look-behind assertions, (?<= ), which were covered in the previous article in this series. These are just looking for an X or a Y, followed by an equal sign, and then the string of one or more digits which we wish to extract. Let's make these constants in our code, so it's easier to concentrate on what's around them (I've replaced the numeral set [0-9] with the digit class \d, another personal preference):
const string x = @"(?<=X=)\d+";
const string y = @"(?<=Y=)\d+";
Without named groups, we would simply incorporate x and y into a grouping pattern with an intervening wild string:
const string pattern = "(" + x + ").*(" + y + ")";
var input = "@1,X=123@1,Y=456";
var match = Regex.Match(input, pattern);
if (match.Success)
 return match.Result("($1,$2)"); // Output: (123,456)
Two changes are required to convert to named groups. First the names have to be applied. This involves adding a ? at the start of each group, followed by its name in either angle brackets, or as here, single quotes:
const string pattern = "(?'X'" + x + ").*(?'Y'" + y + ")";
Then after the match succeeds, these names, this time enclosed in curly braces, can be used to extract the relevant matched values:
if (match.Success)
 return match.Result("(${X},${Y})"); // Output: (123,456)
Follow Up Questions
Any reason you didn't use string.Format to define the regex?
This is what I ended up with:
const string xCoordGroupName = "XCOORD";
const string yCoordGroupName = "YCOORD";
string pattern = string.Format("X=(?<{0}>[0-9]+).*Y=(?<{1}>[0-9]+)", xCoordGroupName, yCoordGroupName);
var match = Regex.Match(coords, pattern);
Readability by any chance?
Try changing that to "const string pattern...".
The + string concatenation operator is not inefficient at compile time.
Also, you often get a lot of {}s in your patterns, and Format doesn't like that.

No comments:

Post a Comment