Just Another Blog

Are you thinking what I'm thinking?

Monday, July 24, 2006

CSV Parser for PHP

fgetcsv() sucks because it doesn't handle quoted newlines. So I wrote a simple CSV parser class for PHP myself, based on the JavaScript version that I wrote earlier (it doesn't handle quoted newlines either). Not prefect (e.g. newlines are standardized to \n, and string matching works only for ASCII or UTF-8), but does what I want it to do.

/**
 * CSV file parser
 * Currently the string matching doesn't work
 * if the output encoding is not ASCII or UTF-8
 */
class CsvFileParser
{
    var $delimiter;         // Field delimiter
    var $enclosure;         // Field enclosure character
    var $inputEncoding;     // Input character encoding
    var $outputEncoding;    // Output character encoding
    var $data;              // CSV data as 2D array

    /**
     * Constructor
     */
    function CsvFileParser()
    {
        $this->delimiter = ",";
        $this->enclosure = '"';
        $this->inputEncoding = "ISO-8859-1";
        $this->outputEncoding = "ISO-8859-1";
        $this->data = array();
    }

    /**
     * Parse CSV from file
     * @param   content     The CSV filename
     * @param   hasBOM      Using BOM or not
     * @return Success or not
     */
    function ParseFromFile( $filename, $hasBOM = false )
    {
        if ( !is_readable($filename) )
        {
            return false;
        }
        return $this->ParseFromString( file_get_contents($filename), $hasBOM );
    }

    /**
     * Parse CSV from string
     * @param   content     The CSV string
     * @param   hasBOM      Using BOM or not
     * @return Success or not
     */
    function ParseFromString( $content, $hasBOM = false )
    {
        $content = iconv($this->inputEncoding, $this->outputEncoding, $content );
        $content = str_replace( "\r\n", "\n", $content );
        $content = str_replace( "\r", "\n", $content );
        if ( $hasBOM )                                // Remove the BOM (first 3 bytes)
        {
            $content = substr( $content, 3 );
        }
        if ( $content[strlen($content)-1] != "\n" )   // Make sure it always end with a newline
        {
            $content .= "\n";
        }

        // Parse the content character by character
        $row = array( "" );
        $idx = 0;
        $quoted = false;
        for ( $i = 0; $i < strlen($content); $i++ )
        {
            $ch = $content[$i];
            if ( $ch == $this->enclosure )
            {
                $quoted = !$quoted;
            }

            // End of line
            if ( $ch == "\n" && !$quoted )
            {
                // Remove enclosure delimiters
                for ( $k = 0; $k < count($row); $k++ )
                {
                    if ( $row[$k] != "" && $row[$k][0] == $this->enclosure )
                    {
                        $row[$k] = substr( $row[$k], 1, strlen($row[$k]) - 2 );
                    }
                    $row[$k] = str_replace( str_repeat($this->enclosure, 2), $this->enclosure, $row[$k] );
                }

                // Append row into table
                $this->data[] = $row;
                $row = array( "" );
                $idx = 0;
            }

            // End of field
            else if ( $ch == $this->delimiter && !$quoted )
            {
                $row[++$idx] = "";
            }

            // Inside the field
            else
            {
                $row[$idx] .= $ch;
            }
        }

        return true;
    }
}

26 Comments:

Note that troll and spam comments will be deleted without any notification.

  • At 11/09/2006 09:31:00 pm, Anonymous Anonymous said…

    This was exactly the thing I was looking for. Thank you for saving me so much time!

     
  • At 11/23/2006 11:06:00 pm, Blogger jimeh said…

    hey, good job on your class :)... i used the processing concept of your class to write my own which handles multiline quotes, and quotes within quoted values. hope you don't mind :)

    http://zhuoqe.org/blog/2006/11/23/code-yard-and-php-csv-parser/

     
  • At 12/20/2007 01:25:00 am, Anonymous Anonymous said…

    Hi,

    you made a great job with that class.

    I have added some code to optionally change the output format and use associative arrays or even dynamically created objects (PHP5 required).

    I hope you don't mind and someone finds the changes useful :D

    View source code

    --- hippydream ---

     
  • At 12/20/2007 01:36:00 am, Blogger minghong said…

    Of course I don't mind…! ;-)

     
  • At 1/11/2008 01:39:00 am, Anonymous Anonymous said…

    Dear Sir:

    Hi. I have seen your posts on Wikipedia and think they are great. I know you are very knowledgable and hope you don't mind this question. I was wondering if you might have any idea where an American might buy software to translate English into bo-po-mo-fo pronunciation of Chinse. Does such a thing exist for Palm Pilots?

    Thanks
    Diane Michel

     
  • At 9/10/2008 06:19:00 am, Anonymous Anonymous said…

    Here is a much faster and simpler function to parse a CSV file. (I benchmarked the original class at around 9 seconds for a 3mb file with 10,000 lines. This function does the same file in 0.60 seconds).

    define('MAX_CSV_LINE_LENGTH', 7000);
    //Uses fgetcsv() to parse a CSV file.
    function csvParseFile($filename, $separator = "\t") {

    $fp = fopen($filename, 'r');
    if ($fp === false) {
    return false;
    }

    $result = array();
    while ($line = fgetcsv($fp, MAX_CSV_LINE_LENGTH, $separator)) {
    array_push($result, $line);
    }

    fclose($fp);

    return $result;
    }

    You can contact me at trusty_jim@hotmail.com

     
  • At 9/10/2008 09:36:00 am, Blogger minghong said…

    The point is that fgetcsv doesn't handle quoted newlines. So it's not about the speed…

     
  • At 3/02/2009 03:12:00 pm, Anonymous Anonymous said…

    Alternatively you could use this package at PEAR
    http://pear.php.net/package/File_CSV_DataSource

     
  • At 10/13/2009 07:55:00 am, Blogger Unknown said…

    Hi, firstly many thanks about your job! Secondly, i am a CS student from Greece and i need some help, if yoy pleased. In my application(not for commercial use), the tables and their fields will be created dynamically. I used csv's headers for that purpose an i added another field "id". After the creation of the table i need to import the data to table. I use the following script:

    $k=0;
    $j=1;

    //Insert Data to table
    foreach ($csv->data as $key => $row){
    $k=0;
    foreach ($row as $value) {
    echo "
    $value
    ";
    $sql= "INSERT INTO inouts ('id','$cols[$k]')
    VALUES ($j,$value)";

    // Execute query
    $result=mysql_query($sql,$link);

    if(mysql_error()){
    echo mysql_error();
    }
    $k++;
    }
    $j++;
    }

    And i get the following error:


    You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''id','DMUS')
    VALUES (1,Brighton)' at line 1


    4453
    You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''id','INPUT1')
    VALUES (1,4453)' at line 1


    5456
    You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''id','INPUT2')
    VALUES (1,5456)' at line 1

    etc...

    Thank you in advance, sorry about the huge post.

     
  • At 10/13/2009 10:23:00 am, Blogger minghong said…

    You don't have single quotes around the string values.

    e.g. VALUES (1,Brighton) should be VALUES (1,'Brighton')

    Also, you should use mysql_real_escape_string() to escape the string values, so as to avoid SQL syntax error and SQL injection.

     
  • At 10/13/2009 08:50:00 pm, Blogger Unknown said…

    Thank you for your reply, i did some changes as you proposed but i still get the same error.
    &j=1;
    foreach ($csv->data as $key => $row){
    $k=0;
    foreach ($row as $value) {


    $sql= "INSERT INTO inouts ('id','$colmns')
    VALUES (
    '". mysql_real_escape_string($j) ."',
    '". mysql_real_escape_string($value) ."' )";

    // Execute query
    $result=mysql_query($sql,$link);
    if(mysql_error()){
    echo mysql_error(); }
    $k++;
    }
    $j++;
    }

     
  • At 10/13/2009 08:55:00 pm, Blogger Unknown said…

    I am sorry i missed that :

    $colmns=mysql_real_escape_string($cols[$k]);

    before the $sql statement.

     
  • At 10/08/2010 11:50:00 pm, Blogger SimonSimCity said…

    This comment has been removed by the author.

     
  • At 10/08/2010 11:53:00 pm, Blogger SimonSimCity said…

    Hi, all

    Here's my solution of an mysql-importer:


    function prepare_db_strings( &$string ) {
    $string = mysql_real_escape_string( $string );
    }

    mysql_query("TRUNCATE `$tableName`;");
    $query = "INSERT INTO `$tableName` VALUES ";
    foreach ( $csvParser->data as $row ) {
    array_walk($row, 'prepare_db_strings');
    $query .= ' ("'.implode($row, '", "').'"),';
    }

    $query = substr($query, 0, -1);
    mysql_unbuffered_query($query);

     
  • At 10/11/2010 03:52:00 pm, Blogger SimonSimCity said…

    Hi, all

    I think fgetcsv() can handle quoted-newlines now ...

    I've tested the following csv-file:

    1997,Ford,E350,"ac, abs, moon",3000.00
    1999,Chevy,"Venture ""Extended Edition""","",4900.00
    1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
    1996,Jeep,Grand Cherokee,"MUST SELL!
    air, moon roof, loaded",4799.00


    Here's my php-code:

    <?php

    $res = fopen('test.csv', "r");

    while (($data = fgetcsv($res)) !== FALSE) {
    var_dump($data);
    }

    fclose($res);

    ?>

     
  • At 10/11/2010 03:59:00 pm, Blogger minghong said…

    Maybe it got improved… After all, it has been 4 years already.

     
  • At 10/11/2010 04:18:00 pm, Blogger SimonSimCity said…

    Hi, minghong

    Yes - my example (copied from wikipedia) works.
    I got a large csv-file and it crashed using fgetcsv() ...
    I've also tried your php-class. It took a long time but it worked!

     
  • At 4/02/2011 12:55:00 am, Blogger Unknown said…

    I've had limited results from this script to achieve what im looking for ... maybe someone can help. i need the php script to scan multiple csv files (preferably all in a folder), turn the data from each file into an array, then combine the data from each csv file and print it on the screen. i have a lot of csv files what aren't formatted properly but the data can be put into an array ... i'd like to use the data, fix it, and put everything back into 1 clean csv file.

     
  • At 4/02/2011 01:02:00 am, Blogger minghong said…

    That should be simple. Just use a PHP library/class/function for reading the file and implement the logic you want.

    Beside this class, you may like to give PHPExcel a try.

     
  • At 7/27/2011 10:51:00 pm, Blogger SimonSimCity said…

    Hi, all

    The only thing missing for me is a better handling of memory :)

    Since there's no better way (as far as I know) I'd add a new parameter where you can insert a function that should be called when the scripts has finished reading a row.

    Using this fix you can import allmost every csv-file using less memory.

    If you don't know how to do: Here's a nice hint:
    http://www.php.net/manual/de/function.call-user-func-array.php

    p.s. maybe you'll find an option to replace file_get_contents() with fgets() to use even less memory :)

     
  • At 8/17/2011 11:23:00 am, Anonymous Paid Critique said…

    what could be the problem if everytime you upload an image it always display

    You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near

    ??????

    Any help?

     
  • At 6/20/2012 06:05:00 am, Anonymous Christopher Stevens said…

    Six years later: This csv parser worked great for what I needed. Thanks minghong for posting!!

     
  • At 8/25/2013 01:59:00 am, Anonymous Thomas said…

    Is there a posibility to use semicolon istead of comma? i change it in the var $delimiter = ';'; but it doesnt work :(

     
  • At 12/24/2013 10:48:00 pm, Anonymous Anonymous said…

    check parsecsv-for-php:
    http://code.google.com/p/parsecsv-for-php/

     
  • At 8/12/2019 05:58:00 pm, Anonymous Eagle Focus(Hong Kong) Limited said…

    nice post

     
  • At 11/25/2020 10:21:00 am, Anonymous online erectile dysfunction prescription said…

    I'm now not sure where you are getting your information, however good topic. I must spend some time studying more or working out more. Thank you for wonderful info I used to be looking for this info for my mission.

     

Post a Comment

<< Home