Perl CSV: Reading Comma Separated Values in Perl

A common task in Perl is reading files of comma separated values. While the exact form of the Perl program you use to read such files will naturally depend on exactly what you're trying to achieve, this task is sufficiently common that it's worth going over some of the basics in tutorial form.



Let's suppose we have the following data in a file named 'test.csv'. We'll assume the file is, as is typically the case, ASCII encoded, not UTF8 or some such thing.

firstname,surname,job
john,smith,plumber
mandy,brown,model
rob,binks,food artist
rachel,fish,architect




The Hello World Perl Program



The first thing we need is a basic Perl program that will serve as the starting point for our CSV reader.

use strict;
use warnings;

sub main 
{
    print "Hello World";
}

main();




The code above is a good structure to use for all your Perl programs.

We've added use strict to force us to declare all variables with my. This helps lessen mistakes due to typos. We've also added use warnings so that we are automatically warned about some potential problems.

Finally, we've declared a main() subroutine to act as a clearly-defined starting point for our program, and we've called main() to actually start the program.

Opening and Closing a File in Perl



Next we add some code to open the file.

use strict;
use warnings;

sub main 
{
    # Note: this could be a full file path
    my $filename = "test.csv";
    
    open(INPUT, $filename) or die "Cannot open $filename";
    
    close(INPUT);
}

main();




Firstly we store the file name in a variable for easy reference. We could have specified a full file path here, or a relative file path, but since the file is in the same directory as the script in this case, we can just use the file name.

To open the file we use the open() function. We pass the open function some text to be used as a file handle (INPUT here, but it could be anything you like -- uppercase letters are the usual convention), and the name of the file to open.

The or die() bit says that if the file can't be opened and open() consequently returns a value equivalent to 'false', we will call the die() function, which we also tell to display an appropriate message.

Finally we close the file; this isn't strictly necessary in this case since all files are closed when a Perl program exits.

It's best to run the program at this stage (and frequently thereafter) to check that no warning message is produced, and hence that the file can be opened.

Reading the Header Line



Now we want to read off the top line of the file. In fact, we probably want to throw it away. But here we'll read it into a "scalar" (single value) variable, so that we can display it and hence double-check that the file really has been opened and everything's OK. At this stage you'll encounter problems if you're file's not ASCII encoded.

To read one line from a file, you can just write this:

<INPUT>;

Where INPUT is whatever text you're using for a file handle. The angle brackets here read one "record" from the file, but by default a record means everything up to and including an invisible newline character -- or in other words, a single line.

We've refined this further by declaring a variable $line and setting it equal to what <INPUT> returns, then displaying it.

use strict;
use warnings;

sub main 
{
    # Note: this could be a full file path
    my $filename = "test.csv";
    
    open(INPUT, $filename) or die "Cannot open $filename";
    
    # Read the header line.
    my $line = <INPUT>;
    
    # Display the header, just to check things are working.
    print($line);
    
    close(INPUT);
}

main();




firstname,surname,job




Reading All the Lines of a File in a Loop



Reading off the header line is a good start, but let's assume we want to read all the lines of the file one by one. We'll start by just reading each line and displaying it.

To do this we use a 'while' loop. In the following code, we repeatedly set $line equal to each line of the file in turn until there are no more lines to read, when the angle bracket operator <> returns undef, $line is set to undef and the loop immediately terminates.

use strict;
use warnings;

sub main 
{
    # Note: this could be a full file path
    my $filename = "test.csv";
    
    open(INPUT, $filename) or die "Cannot open $filename";
    
    # Read the header line.
    my $line = <INPUT>;
    
    
    # Read the lines one by one.
    while($line = <INPUT>)
    {
        # Just display the line for now.
        print($line);
    }
    
    
    
    close(INPUT);
}

main();




john,smith,plumber
mandy,brown,model
rob,binks,food artist
rachel,fish,architect





One small problem is that the lines we're reading still have the invisible newline character attached to the end of them. You can see this if you change the print() statement to display a quote before and after the line.

print("'$line'");




'john,smith,plumber
''mandy,brown,model
''rob,binks,food artist
''rachel,fish,architect




Here we've embedded the $line variable in a string, and put quotes just before and after it, so that the (inner) quotes are displayed. You can see that the first quote prints alright; then the second quote is wrapped to the second line due to the invisible newline character after 'plumber', and so on.

We can get rid of the last character in a string (the newline in this case) by using the chomp function.

Change the while loop to the following:

# Read the lines one by one.
while($line = <INPUT>)
{
    chomp($line);
        
    # Just display the line for now.
    print("'$line'");
}




Now the output is:

'john,smith,plumber''mandy,brown,model''rob,binks,food artist''rachel,fish,architect'




No more newline, which is what we want.

Splitting the Records



Next we can use the split() function to split the records into an array.

In the following code we've modified the 'while' loop to add a split(), telling it to split $line on commas and return the results into an array called @values.

# Read the lines one by one.
while($line = <INPUT>)
{
    chomp($line);
    
    my @values = split(',', $line);
}




In fact, we can go one better than this. We can declare an array of named variables and set them equal to the results of the split. Then we can reference the values by name.

The Complete Program



The final listing is a complete program that reads the CSV file, places the values from each line into some variables and displays the results for you to see.

There's a lot more to reading CSV files than this of course; you might need to clean the values, store them somehow, validate them, etc. And you might want to read lots of files, or even cross-reference files. But this is enough to get you started. There's plenty more stuff on this site and the rest of the Internet about Perl, but if you want some personal 1-to-1 lessons in Perl, don't hesitate to get in touch -- click the ad below.



use strict;
use warnings;

sub main 
{
    # Note: this could be a full file path
    my $filename = "test.csv";
    
    open(INPUT, $filename) or die "Cannot open $filename";
    
    # Read the header line.
    my $line = <INPUT>;
    
    
    # Read the lines one by one.
    while($line = <INPUT>)
    {
        chomp($line);
    
        my ($firstname, $surname, $job) = split(',', $line);
    
        print "$firstname $surname works as a $jobn";
    }
    
    close(INPUT);
}

main();




john smith works as a plumber
mandy brown works as a model
rob binks works as a food artist
rachel fish works as a architect