Welcome!

Machine Learning Authors: Zakia Bouachraoui, Liz McMillan, Roger Strukhoff, Pat Romanski, Carmen Gonzalez

Related Topics: Machine Learning

Machine Learning : Article

Real-World AJAX Book Preview: Getting Expressive with Regular Expressions

Real-World AJAX Book Preview: Getting Expressive with Regular Expressions

This content is reprinted from Real-World AJAX: Secrets of the Masters published by SYS-CON Books. To order the entire book now along with companion DVDs for the special pre-order price, click here for more information. Aimed at everyone from enterprise developers to self-taught scripters, Real-World AJAX: Secrets of the Masters is the perfect book for anyone who wants to start developing AJAX applications.

Getting Expressive with Regular Expressions
Regular expressions (or Regexes, as they are sometimes called) provide a way of defining text patterns that can be used for validation, testing, and string replacement. The Regex language has expanded considerably over the years, providing a remarkably rich and robust set of tools for parsing content and building new content, something that comes in handy when dealing with AJAX-based systems.

In JavaScript, regular expressions are core objects just like strings and arrays and can be defined using either a specific object (in this case the RegExp() object), or by using the forward slash delimiters // (just as [] designates an array and "" designates a string). Thus, a regular expression matching the string sequence 'test' could be declared as:

var retest = new RegExp('test');
var retest = /test/;

Note: You should be careful to differentiate between the forward slash containers used in regexes and the comment delimiter //. The expression

retest = //

is not a commented-out statement but an empty regular expression.

Regular expressions consist of two parts:

  • Pattern. The pattern is the sequence of characters that identifies the regular expression.
  • Flags. The flags consist of three distinct character indicators that determine the scope of the regex:
    - Global (g): The global flag indicates that the regular expression should be applied to all potential matches in a string rather than just the first. If the global flag is false, only the first occurrence of a regular expression will be returned.
    - Ignore Case (i): This flag indicates that the regular expression should be applied to either upper-or lower-case alphabetic characters indiscriminately. If the ignoreCase flag is false, the regular expression will explicitly match only those terms that have the same case.
    - Multiline (m): Normally the regular rxpression automatically stops at the end of a line designated with a carriage return or new line character. If the multi-line flag is set to true, the match will ignore such characters and continue to match past line boundaries.
You can set these patterns in turn in one of three ways – either by putting the flags in the Regular Expression after the second forward slash, setting it as the second argument of the RegExp() constructor, or setting it via one of the flag properties. For instance, to create a regular expression that will search through an entire file for all instances of the word "test" in any permutation ("TEST," "Test," "test," etc.), your regular expression would look like:

var reTest = /test/gmi;

or

var reTest = new RegExp("test","gmi");

or

var reTest = new RegExp("test");
reTest.global = true;
reTest.multiline = true;
reTest.ignoreCase = true;

The simplest operation that a regular expression can be used with is the test() method. This method, on the regex, compares the string argument passed to it with the regular expression and determines whether or not the pattern is matched. For instance:

var reTest=/test/i;
print(reTest.test("Testament"));
=> true

Beyond test(), the next most useful regular expression command is actually located on the String() object – the replace() method. This particular method uses the string it's attached to as its base and a Regular Expression argument to find a set of matches, then replaces matches with the second argument.

For instance, suppose you wanted to suppress the appearance of all numbers in a credit card sequence and replace them with asterisk characters. You could use the following commands:

cc = "123-456-789";
reNum=/[0-9]/g;
print(cc.replace(reNum,"*"));
=> ***-***-***

Note that unlike arrays, the replace method doesn't alter the string, but rather creates a new string as a result (that is to say, the value in the variable cc remains the same).

The notation [0-9] indicates one of many different abbreviations that make regexes at least notionally easier to work with. In this particular case, it indicates a match of any character in the range of 0 to 9, i.e., any numeric digit. If you wanted to indicate all alphanumeric characteristics you'd set up three ranges – [0-9A-Za-z]. You could also use the pipe "|" character to indicate alternatives:

(0|1|2|3|4|5|6|7|8|9)

But obviously this is going to be more cumbersome. The pipe does come in handy, however, when you're trying to provide a range of potential values to be used for validation, such as a range of colors:

reColors = /^(red|blue|green|yellow|orange|purple|black|white)$/;
color="red";
print(reColors.test(color));
=> true;
color="gold";
print(reColors.test(color));
=> false;

The two characters caret "^" and dollar "$" indicate that the regular expression should be valid from the start of the search range (the first character) to the end of the search range (the last character). Without them, the regular expression would return true if the target sequence was found anywhere in the source string. Thus,

reColors1 = /^red$/;
color="red";
print(reColors1.test(color));
=> true;
color="barred";
print(reColors.test(color));
=> false;
reColors1 = /red/;
color="red";
print(reColors1.test(color));
=> true;
color="barred";
print(reColors.test(color));
=> true;

There are numerous other specialized characters that are used with regular expressions. As with strings, these character sequences are indicated with an escaping backslash, and for the most part correspond to string notation (see Table 2.2).

In general, if a character has a specialized meaning in a Regular Expression, escaping it will cause the character itself to be represented instead, such as a \( indicating a parentheses character rather than the start of an expression).

In addition to these characters, the regular expression library includes a number of operators to determine existence, repetition, and negation, as given in Table 2.3.

For instance, let's say you want to ensure that a given content block was a credit card of the form 123-456-789. You could use a regular expression with the abbreviated forms to check not only the boundaries but the repetitions:

var cc = "123-456-789";
var reCC = /^\d{3}-\d{3}-\d{3}$/;
print(reCC.test(cc));
      => true

Postal codes are a little more complex, especially if you want to include both American and Canadian/British codes. If you have to check both in the same field, the regex might look something like:

var rePostalCode = /^\d{5}(-\d{4})?$|^[a-z]\d[a-z](\-|\s)?\d[a-z]\d$/i;

This rather cryptic string can be broken down fairly handily into several component parts, as shown in Table 2.4:

While you can do straight validations with regular expressions (especially useful for forms processing), regexes are actually more powerful when combined with the String().replace() method. While replace() normally takes a string as the first argument as a replacement target, if a regular expression is supplied, you can take advantage of the considerably richer capabilities to do some nearly magical effects.

For instance, suppose you wanted to replace everything that looks like it might be an e-mail address with a mailto: link. You can use regexes to solve this problem quite easily:

msg = "For more information, please contact Kurt Cagle at [email protected] or
Tom Generic at [email protected]"
reAtMail = /((?:[A-Z]\w+\s?)+)at\s((?:\w+[._-])*\[email protected](?:\w+\.)*\w+)/gi;
linkedMsg = msg.replace(reAtMail,'<a href="mailto:$2">$1</a>')
=> For more information, please contact <a href="mailto:[email protected]">Kurt
Cagle </a> or <a href="mailto:[email protected]">Tom Generic </a>.

This particular regular expression looks for the pattern "Name Name at [email protected]" and rewrites it as <a href="mailto:[email protected]">Name Name</a>. This illustrates both matching groups (anything in parentheses) and non-matching groups (?:anything in parentheses starting with ?:). Internally, each matching group gets saved in a variable $1,$2,$3, and the replace() method's second parameter can then reference these as part of a string template to insert the matched text back into the resulting string.

Regular expressions are incredibly powerful for parsing and converting both text- and XMLbased content and should be considered an indispensable part of any AJAX-based toolkit. Indeed, especially in validation types of applications, you can actually create libraries of commonly used regexes consolidated as a single object, such as:

var RegexLib = {
reMail: /((?:\w+[._-])*\[email protected](?:\w+\.)*\w+)/g,
reAtMail: /((?:[A-Z]\w+\s?)+)at\s((?:\w+[._-])*\[email protected](?:\w+\.)*\w+)/g,
reDoubleQuote: /"([^"]*)"/g,
reSingleQuote : /'([^']*)'/g,

}

msg = "For more information, please contact Kurt Cagle at [email protected] or
Tom Generic at [email protected]";
msg.replace(RegexLib.reAtMail,"<a href='mailto:$2'>$1</a>");
For more information, <a href='mailto:[email protected]'>please contact Kurt
Cagle </a> <a href='mailto:[email protected]'>or Tom Generic </a>.

This content is reprinted from Real-World AJAX: Secrets of the Masters published by SYS-CON Books. To order the entire book now along with companion DVDs, click here to order.

More Stories By Kurt Cagle

Kurt Cagle is a developer and author, with nearly 20 books to his name and several dozen articles. He writes about Web technologies, open source, Java, and .NET programming issues. He has also worked with Microsoft and others to develop white papers on these technologies. He is the owner of Cagle Communications and a co-author of Real-World AJAX: Secrets of the Masters (SYS-CON books, 2006).

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


CloudEXPO Stories
The precious oil is extracted from the seeds of prickly pear cactus plant. After taking out the seeds from the fruits, they are adequately dried and then cold pressed to obtain the oil. Indeed, the prickly seed oil is quite expensive. Well, that is understandable when you consider the fact that the seeds are really tiny and each seed contain only about 5% of oil in it at most, plus the seeds are usually handpicked from the fruits. This means it will take tons of these seeds to produce just one bottle of the oil for commercial purpose. But from its medical properties to its culinary importance, skin lightening, moisturizing, and protection abilities, down to its extraordinary hair care properties, prickly seed oil has got lots of excellent rewards for anyone who pays the price.
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected path for IoT innovators to scale globally, and the smartest path to cross-device synergy in an instrumented, connected world.
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
ScaleMP is presenting at CloudEXPO 2019, held June 24-26 in Santa Clara, and we’d love to see you there. At the conference, we’ll demonstrate how ScaleMP is solving one of the most vexing challenges for cloud — memory cost and limit of scale — and how our innovative vSMP MemoryONE solution provides affordable larger server memory for the private and public cloud. Please visit us at Booth No. 519 to connect with our experts and learn more about vSMP MemoryONE and how it is already serving some of the world’s largest data centers. Click here to schedule a meeting with our experts and executives.
Darktrace is the world's leading AI company for cyber security. Created by mathematicians from the University of Cambridge, Darktrace's Enterprise Immune System is the first non-consumer application of machine learning to work at scale, across all network types, from physical, virtualized, and cloud, through to IoT and industrial control systems. Installed as a self-configuring cyber defense platform, Darktrace continuously learns what is ‘normal' for all devices and users, updating its understanding as the environment changes.