| By Kurt Cagle | Article Rating: |
|
| March 14, 2007 02:00 PM EDT | Reads: |
5,984 |
This content is reprinted from Real-World AJAX: Secrets of the Masters published by SYS-CON Books. To order the entire book now along with companion DVDs for the special pre-order price, click here for more information. Aimed at everyone from enterprise developers to self-taught scripters, Real-World AJAX: Secrets of the Masters is the perfect book for anyone who wants to start developing AJAX applications.
Getting Expressive with Regular Expressions
Regular expressions (or Regexes, as they are sometimes called) provide a way of defining text patterns that can be used for validation, testing, and string replacement. The Regex language has expanded considerably over the years, providing a remarkably rich and robust set of tools for parsing content and building new content, something that comes in handy when dealing with AJAX-based systems.
In JavaScript, regular expressions are core objects just like strings and arrays and can be defined using either a specific object (in this case the RegExp() object), or by using the forward slash delimiters // (just as [] designates an array and "" designates a string). Thus, a regular expression matching the string sequence 'test' could be declared as:
var retest = new RegExp('test');
var retest = /test/;
Note: You should be careful to differentiate between the forward slash containers used in regexes and the comment delimiter //. The expression
retest = //
is not a commented-out statement but an empty regular expression.
Regular expressions consist of two parts:
- Pattern. The pattern is the sequence of characters that identifies the regular expression.
- Flags. The flags consist of three distinct character indicators that determine the scope of the regex:
- Global (g): The global flag indicates that the regular expression should be applied to all potential matches in a string rather than just the first. If the global flag is false, only the first occurrence of a regular expression will be returned.
- Ignore Case (i): This flag indicates that the regular expression should be applied to either upper-or lower-case alphabetic characters indiscriminately. If the ignoreCase flag is false, the regular expression will explicitly match only those terms that have the same case.
- Multiline (m): Normally the regular rxpression automatically stops at the end of a line designated with a carriage return or new line character. If the multi-line flag is set to true, the match will ignore such characters and continue to match past line boundaries.
var reTest = /test/gmi;
or
var reTest = new RegExp("test","gmi");
or
var reTest = new RegExp("test");
reTest.global = true;
reTest.multiline = true;
reTest.ignoreCase = true;
The simplest operation that a regular expression can be used with is the test() method. This method, on the regex, compares the string argument passed to it with the regular expression and determines whether or not the pattern is matched. For instance:
var reTest=/test/i;
print(reTest.test("Testament"));
=> true
Beyond test(), the next most useful regular expression command is actually located on the String() object – the replace() method. This particular method uses the string it's attached to as its base and a Regular Expression argument to find a set of matches, then replaces matches with the second argument.
For instance, suppose you wanted to suppress the appearance of all numbers in a credit card sequence and replace them with asterisk characters. You could use the following commands:
cc = "123-456-789";
reNum=/[0-9]/g;
print(cc.replace(reNum,"*"));
=> ***-***-***
Note that unlike arrays, the replace method doesn't alter the string, but rather creates a new string as a result (that is to say, the value in the variable cc remains the same).
The notation [0-9] indicates one of many different abbreviations that make regexes at least notionally easier to work with. In this particular case, it indicates a match of any character in the range of 0 to 9, i.e., any numeric digit. If you wanted to indicate all alphanumeric characteristics you'd set up three ranges – [0-9A-Za-z]. You could also use the pipe "|" character to indicate alternatives:
(0|1|2|3|4|5|6|7|8|9)
But obviously this is going to be more cumbersome. The pipe does come in handy, however, when you're trying to provide a range of potential values to be used for validation, such as a range of colors:
reColors = /^(red|blue|green|yellow|orange|purple|black|white)$/;
color="red";
print(reColors.test(color));
=> true;
color="gold";
print(reColors.test(color));
=> false;
The two characters caret "^" and dollar "$" indicate that the regular expression should be valid from the start of the search range (the first character) to the end of the search range (the last character). Without them, the regular expression would return true if the target sequence was found anywhere in the source string. Thus,
reColors1 = /^red$/;
color="red";
print(reColors1.test(color));
=> true;
color="barred";
print(reColors.test(color));
=> false;
reColors1 = /red/;
color="red";
print(reColors1.test(color));
=> true;
color="barred";
print(reColors.test(color));
=> true;
There are numerous other specialized characters that are used with regular expressions. As with strings, these character sequences are indicated with an escaping backslash, and for the most part correspond to string notation (see Table 2.2).
In general, if a character has a specialized meaning in a Regular Expression, escaping it will cause the character itself to be represented instead, such as a \( indicating a parentheses character rather than the start of an expression).
In addition to these characters, the regular expression library includes a number of operators to determine existence, repetition, and negation, as given in Table 2.3.
For instance, let's say you want to ensure that a given content block was a credit card of the form 123-456-789. You could use a regular expression with the abbreviated forms to check not only the boundaries but the repetitions:
var cc = "123-456-789";
var reCC = /^\d{3}-\d{3}-\d{3}$/;
print(reCC.test(cc));
=> true
Postal codes are a little more complex, especially if you want to include both American and Canadian/British codes. If you have to check both in the same field, the regex might look something like:
var rePostalCode = /^\d{5}(-\d{4})?$|^[a-z]\d[a-z](\-|\s)?\d[a-z]\d$/i;
This rather cryptic string can be broken down fairly handily into several component parts, as shown in Table 2.4:
While you can do straight validations with regular expressions (especially useful for forms processing), regexes are actually more powerful when combined with the String().replace() method. While replace() normally takes a string as the first argument as a replacement target, if a regular expression is supplied, you can take advantage of the considerably richer capabilities to do some nearly magical effects.
For instance, suppose you wanted to replace everything that looks like it might be an e-mail address with a mailto: link. You can use regexes to solve this problem quite easily:
msg = "For more information, please contact Kurt Cagle at kurt.cagle@gmail.com or
Tom Generic at generic@generic.com."
reAtMail = /((?:[A-Z]\w+\s?)+)at\s((?:\w+[._-])*\w+@(?:\w+\.)*\w+)/gi;
linkedMsg = msg.replace(reAtMail,'<a href="mailto:$2">$1</a>')
=> For more information, please contact <a href="mailto:kurt.cagle@gmail.com">Kurt
Cagle </a> or <a href="mailto:generic@generic.com">Tom Generic </a>.
This particular regular expression looks for the pattern "Name Name at username@server" and rewrites it as <a href="mailto:username@server">Name Name</a>. This illustrates both matching groups (anything in parentheses) and non-matching groups (?:anything in parentheses starting with ?:). Internally, each matching group gets saved in a variable $1,$2,$3, and the replace() method's second parameter can then reference these as part of a string template to insert the matched text back into the resulting string.
Regular expressions are incredibly powerful for parsing and converting both text- and XMLbased content and should be considered an indispensable part of any AJAX-based toolkit. Indeed, especially in validation types of applications, you can actually create libraries of commonly used regexes consolidated as a single object, such as:
var RegexLib = {
reMail: /((?:\w+[._-])*\w+@(?:\w+\.)*\w+)/g,
reAtMail: /((?:[A-Z]\w+\s?)+)at\s((?:\w+[._-])*\w+@(?:\w+\.)*\w+)/g,
reDoubleQuote: /"([^"]*)"/g,
reSingleQuote : /'([^']*)'/g,
}
msg = "For more information, please contact Kurt Cagle at kurt.cagle@gmail.com or
Tom Generic at generic@generic.com.";
msg.replace(RegexLib.reAtMail,"<a href='mailto:$2'>$1</a>");
For more information, <a href='mailto:kurt.cagle@gmail.com'>please contact Kurt
Cagle </a> <a href='mailto:generic@generic.com'>or Tom Generic </a>.
This content is reprinted from Real-World AJAX: Secrets of the Masters published by SYS-CON Books. To order the entire book now along with companion DVDs, click here to order.
Published March 14, 2007 Reads 5,984
Copyright © 2007 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Kurt Cagle
Kurt Cagle is a developer and author, with nearly 20 books to his name and several dozen articles. He writes about Web technologies, open source, Java, and .NET programming issues. He has also worked with Microsoft and others to develop white papers on these technologies. He is the owner of Cagle Communications and a co-author of Real-World AJAX: Secrets of the Masters (SYS-CON books, 2006).
- Kindle 2 vs Nook
- Cloud Computing on Gartner's Top 10 List and SYS-CON Events' 2010 Calendar
- Confessions of a Ulitzer Addict
- IBM Hardware Chief, Intel VC Exec Arrested in Insider Trading Scam
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- Ulitzer.com Named Exclusive "New Media" Sponsor of Cloud Computing Conference & Expo
- Moving Your RIA Apps into the Cloud: Seven Challenges
- Adobe’s Aiming ColdFusion at Multiple Clouds
- Windows 7 – Microsoft’s First Step to the Cloud
- Ulitzer Provides a Powerful Social Journalism Platform
- Jill Tummler Singer, Deputy CIO of CIA, Keynotes at GovIT Expo
- Open Source Mobile Cloud Sync and Push Email
- Kindle 2 vs Nook
- The Difference Between Web Hosting and Cloud Computing
- Cloud Computing on Gartner's Top 10 List and SYS-CON Events' 2010 Calendar
- Ajax in RichFaces 3.3, JSF 2 and RichFaces 4
- Confessions of a Ulitzer Addict
- IBM Hardware Chief, Intel VC Exec Arrested in Insider Trading Scam
- My Thoughts on Ulitzer
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- Ulitzer.com Named Exclusive "New Media" Sponsor of Cloud Computing Conference & Expo
- US Post Office Hops a Ride on NetSuite’s Cloud
- Moving Your RIA Apps into the Cloud: Seven Challenges
- Adobe’s Aiming ColdFusion at Multiple Clouds
- Building a Drag-and-Drop Shopping Cart with AJAX
- What Is AJAX?
- Google Maps! AJAX-Style Web Development Using ASP.NET
- Flashback to January 2006: Exclusive SYS-CON.TV Interviews on "OpenAjax Alliance" Announcement
- AJAXWorld Conference & Expo to Take Place October 2-4, 2006, at the Santa Clara Convention Center, California
- AJAX Sponsor Webcasts Are Now Available at AJAXWorld Website
- How and Why AJAX, Not Java, Became the Favored Technology for Rich Internet Applications
- "Real-World AJAX" One-Day Seminar Arrives in Silicon Valley
- AJAXWorld University Announces AJAX Developer Bootcamp
- AJAX Support In JadeLiquid WebRenderer v3.1
- Where Are RIA Technologies Headed in 2008?
- Struts Validations Framework Using AJAX




































