Perl Split: How To Split Strings In Perl

Perl Split: How To Split Strings In Perl

You can split strings into tokens in Perl using the split() function.

split PATTERN, TEXT

Here's an example; we'll split some text on the pipe | character.

my $text = "one|two|three";

# // Here are regular expression quotes.
# Note the backslash before the pipe.
# Pipes are a special character in regular
# expressions (used to specify alternatives)
# therefore we need to escape them here.
my @tokens = split /|/, $text;

# Display the tokens.
print join(', ', @tokens);




one, two, three






Splitting on Spaces



To split on spaces, use the s (space) regex expression, followed by a plus symbol (+). The plus matches one or more of the preceeding (space) characters. This ensure that consecutive spaces are treated as one.

my $text = "one  two three";

# The plus symbol ("one or more")
# enables us to treat consecutive
# spaces as one.
my @tokens = split /\s+/, $text;

# Display the tokens.
print join(', ', @tokens);





one, two, three




Including the Stuff You Split On In the Matched Tokens



Sometimes you want to split a string up but you don't want to throw anything away. You want to include the stuff you're splitting on in the tokens. Here we split some text on XML tags, but we include the tags in the tokens.

use strict;
use warnings;

my $text = q| 
<one>some text</one> <two>contained</two>
<three>in tags</three>
|;

# (?= ... ) says that we do not want to include
# the enclosed regular expression in the match.
# With split, this means the matched stuff
# isn't thrown away.
# [^/<>] means "any character except , < or >
my @sentences = split /(?=<[^\/<>]+>)/, $text;

foreach my $sentence(@sentences) {
    # Remove newlines for readability.
    # Replace them with spaces.
    $sentence =~ s/[\n\r\f]/ /g;
    
    # Trim space from the start and end.
    $sentence =~ s/^s*|s*$//;
    print "$sentence\n";
}




some text
contained
in tags





... and of course you can use more complex regular expressions with split().