Perl Regex: Regular Expressions In Perl

You can use regular expressions in Perl to check whether some text contains certain strings or patterns of text, or to replace part of a piece of text with something else.

Perl contains a powerful regular expression engine, catering to just about all your regular expression needs. Here are some simple examples to get you started.



Matching and Replacing Text in Perl Using Regular Expressions



Here's a simple program illustrating how to a) match and b) replace text in Perl using regular expressions.

use strict;
use warnings;

# Some text ...
my $text = "Let sleeping dogs lie.";

# Matching
if($text =~ /dog/) {
    print "Match!\n";
}
else {
    print "No match.\n";
}

# Replace 'dog' with 'cat'.
$text =~ s/dog/cat/;

print "$text\n";




Match!
Let sleeping cats lie.




Regex Pattern Matching in Perl



Often your criteria for matching strings are more complex than merely searching for a particular exact string. Let's look at some examples.

The following program also illustrates how to 'capture' the results of your match, so you can see what exactly was matched in your text.

# Some text ...
my $text = "Let sleeping dogs lie.";

# We can use round brackets to 'capture'
# the matched text. You can use as many
# pairs of brackets as you like in your
# regex; the results appear in 
# $1, $2, $3 etc.
print "Regex (dog):\n";
$text =~ /(dog)/ and print "$1\n\n";

# Match everything between 's' and 'g'
print "Regex (s.*g):\n";
$text =~ /(s.*g)/ and print "$1\n\n";

# . matches any character. * means
# "as many as possible of these".
# Use a question mark if you want as
# few as possible.
print "Regex (s.*?g):\n";
$text =~ /(s.*?g)/ and print "$1\n\n";




Regex (dog):
dog

Regex (s.*g):
sleeping dog

Regex (s.*?g):
sleeping




Perl Regex Flags: Case Insensitive Matching and Multiple Matches



You can use the i flag with your regular expressions to do case-insensitive matching in Perl.

The g flag allows you to match multiple times with the same expression. If you use round brackets to capture the results of your matches with the g flag, your matches are returned in an array.

Let's look at an example.


use strict;
use warnings;

# Use q|| (multi-line quote) to create a
# big string.
my $text = q|
My attorney had taken his shirt off and 
was pouring beer on his chest, to 
facilitate the tanning process. 
"What the hell are you yelling about?" 
he muttered, staring up at the sun 
with his eyes closed and covered 
with wraparound Spanish sunglasses. 
"Never mind," I said. 
"It's your turn to drive." 
I hit the brakes and aimed the 
Great Red Shark toward the shoulder 
of the highway. No point mentioning 
those bats, I thought. The poor 
bastard will see them soon enough. 
|;

# Match all occurences of 'the'
my @matches = $text =~ /(the)/g;

# Display the matches.
print join(', ', @matches), "\n";

# Same again, but this time we'll
# add in the i flag (case insensitive)

# Match all occurences of 'the'
my @matches = $text =~ /(the)/ig;

# Display the matches.
print join(', ', @matches), "\n";




the, the, the, the, the, the, the, the
the, the, the, the, the, the, the, The, the




Some Useful Perl Regex Tips and Tricks



Here are some of the most common regular expression fragments that you're likely to use in Perl. If there's something you need to do that I haven't covered, feel free to leave a question in the comments.

Here you can see examples of:


  • Matching words

  • Matching HTML or XML tags

  • Removing newline characters

  • Matching characters at the start or end of a string

  • Matching digits



use strict;
use warnings;

# Use q|| (multi-line quote) to create a
# big string.
my $text = q|

1. My attorney had taken his shirt off and 
was pouring beer on his chest, to 
facilitate the tanning process. 
"What the hell are you yelling about?" 
he muttered, staring up at the sun 
with his eyes closed and covered 
with wraparound Spanish sunglasses. 
2. "Never mind," I said. 
"It's your turn to drive." 
I hit the brakes and aimed the 
Great Red Shark toward the shoulder 
of the highway. No point mentioning 
those bats, I thought. 3. The poor 
bastard will see them soon enough. 


|;

# Match all words.
# b matches word boundaries
# (matching words is half art, half science;
# so this simple solution is approximate).
my @words = $text =~ /\bw+\b/g;

print "Found " . @words . " words.\n";

# Match XML tags.
# [...] matches ONE character; possible
# characters are placed between the [].
# If we start with ^, it means, match
# one character EXCEPT one of the enclosed.
# + means 'at least one of the preceeding'.
my @tags = $text =~ /(<[^<>]+>)/g;

print join(', ', @tags), "\n";

# Remove all newline characters.
$text =~ s/[\n\r\f]//g;

# Match the first 20 characters.
# ^ anchors to the start of the string.
# Without the g flag, the returned array 
# of matches will always contain at most 
# one match.
my ($match) = $text =~ /^(.{20})/;

print "First 20: $match\n";

# Remove space and newlines from the 
# start and end of the string. 
#We'll use ^ and $ to anchor
# to the start and end respectively.
# | specifies alternative matches.
$text =~ s/^[\n\r\f\s]*|[\n\r\f\s]*$//g;

# Step through the text matching all digits.
# This illustrates how to step through text
# looking for matches, without having to
# place all the matches all at once in 
# an array.
# We use d to match digits. We can equally well
# use [0-9] 

while($text =~ /(d+)/g) {
    print "$1\n";
}




Found 89 words.
<text>, </text>
First 20: <text>1. My attorney
1
2
3





For more detailed information about regular expressions in Perl, go here.