How To Read CSV Data Tutorial – Perl

In this small tutorial that I had written. It will show an example of how to manipulate a structured CSV data. This does not only apply to csv data. The procedure of handling data with structure is almost the same. If there is a common structure in any type of data and if that pattern is the structure of the entire data. It’s can be manipulate automatically through programming.

The below script demonstrate base on an example that the first row of the data is holding the name of all the following rows. Each row after will have value that defined by the first row. It maybe hard to describe in word but let just imagine that it is a spreadsheet that storing product information. The top rows may have an SKU, Product Name , etc. Each subsequent row then hold a unique product. For how the subsequent row information are stored is base on the first row.

Most of my tutorial is written into the script as comments. This is easier to understand because the comment is right next or close to the execution code. It’s easier to interpret the meaning when it is next to the actual code than read through and see the source code after.

Advertisement

# Written by kevinhng86 @ FAI Hosting Solution.
# A copy of this script can be found on my blog @ http://kevinhng86.iblog.website

use strict;
package readCsvData;
sub main {
    # With CSV, each column of data is separated by a comma.
    # CSV data format in the first row will tell you all field name for the subsequence rows.
    # Thus, the first row would be the key name for all values that are stored in subsequences row. 
    # By knowing this format, we can read in the first row and store the values that separated by the commas in an array.
    # Each subsequence rows would be stored in a hash with value pair with the appropriate key's name that defined by the first row.
    # To put this in perspective, the example below will demonstrate a 3 x 3 CSV row. side by side with the below is how we should store data.
    # SKU, Name   ,Price          => array   ("SKU",          "Name",             "Price")
    # 123, Pencil ,$2.00          => hash    ("SKU" => "123", "Name" => "Pencil", "Price" => "$2.00")
    # 111, Pen    ,$3.00          => hash    ("SKU" => "111", "Name" => "Pen", "   Price" => "$3.00")
    # The spaces above is for illustrating purpose.
    # From this we can see that on row two of SKU, the value of 123 have the name pencil and is priced at $2.00. 

    # The file name that this function is going to work with.
    my ($getfile) = @_; 
    # This array is declared to store the data from the CSV file. Each line is one index. 
    my @output = (); 
    # This hash is to store key and value pair, bases on first row. 
    my %structure = ();
    # This counter is to use to determine the first line. 
    my $count = 0;  
    # This array only use to hold the first line of data as it is define the name for all rows.
    my @firstline = ();
    # This is a temporary array that will hold each value after we split them bases by a comma.
    my @line = ();         
       
    # Open file for reading with the file handler name INPUT. If the file is not found script will exit.
    open INPUT, '<:encoding(UTF-8)', $getfile or die $!;
    #This while loop will read the file one row by one row. The data is assign to the $row variable.
        while (my $row = <INPUT>) {
            #if count = 0 which is the first row 
            if ($count == 0){  
                # We need to trim the two side of the data in the event when there are unnecessary spaces.
                chomp $row;    
                # The data is then split by a comma and each section is stored into an array.
                @firstline = split /,/, $row ; 
                # Cycle through all the index in the first line array to remove unnecessary format.
                for (my $i = 0; $i < scalar(@firstline); $i++){
                    # Removing white space, remove if un necessary.
                    $firstline[$i] =~ s/^\s+//; 
                    $firstline[$i] =~ s/\s+$//; 
                }
                # The first line array is then push onto the output array.
                push @output, [@firstline]; 
            } else {
                # This code block is when it is not first row
                chomp $row; 
                # What ever we get from split the line now will be store in the @line array.
                @line = split /,/, $row;
                # Cycle through the length of the line array.
                for (my $i = 0; $i < scalar(@line); $i++){
                    # In a structured data, the position is the same, therefor to get the key value we compare it to the position of the first line array.
                    my $key = $firstline[$i];
                    # Removing white space, remove if un necessary.
                    $line[$i] =~ s/^\s+//; 
                    $line[$i] =~ s/\s+$//;
                    # The script will then declare a key into a structure hash with the current position of the @line array. 
                    $structure{$key} = $line[$i] ; 
                }
                
                # When the hash is push onto the output array, it will automatically take the next index position inline.
                # remember when you are pushing a hash onto an object you must have {} around its name.
                push @output, {%structure}; 
            }
            # Increase the count each time one line is read.
            # This index count can offer extensive additional features to the script if customize properly.
            $count = $count + 1; 
        }
    # Close the file for reading once everything is done.
    close INPUT; 
    return @output; 
}
    
# To test this you need a CSV file that name test.csv in the same directory of this script.
# The data must be a correct CSV format. Currently, this script only supports CSV data that separated by a comma and is not able to escape a comma that contained within the field.
# This script is a sample of demostration and does not have built in function to check the format. That is another day.
package main;
my @test = readCsvData::main("test.csv");
print $test[0];

This post was written by Kevin and was first post @ http://kevinhng86.iblog.website.
Original Post Name: "How To Read CSV Data Tutorial – Perl".
Original Post Link: http://kevinhng86.iblog.website/2017/01/19/howto-read-csv-data-tutorial-perl/.

Advertisement


Random Article You May Like

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*