Click here to Skip to main content
15,881,709 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hello,

I need to upload a csv file, but the upload limit is around 2MB and the file size is around 10MB.

So my question is, what is the best way to split the data in a separate csv files using PHP? (2MB each)

Thank you
Posted

1 solution

Two things that come to mind immediately are

(1) With php you have access to zlib - it's entirely possible to get compression rates that would reduce 10meg of plain text to 2meg of compressed data. Especially since a csv file has so much repeated, redundant information - which would be as high as 66% if each element is 1 byte and each element is separated by a space and a comma.
-Just tried a 20.4kb file, it compressed to 2.91kb, (using 7z) set to use gzip, Ultra compression. A larger file could get a better rate since there's more data available to match pre-existing tokens with.

(2) Assuming that approach to be unfruitful, I'd simply chop the file into 2meg chunks, append a letter or number to the end of the file, upload the file parts then glue them back together on the receiving end. E.g file.csv.ae, file.csv.be, file.csv.ce, file.csv.de, file.csv.ee - you can keep them in the right order, you can also tell just from the filename if you've received all the parts and can stick it back together again. You can also do that just using javascript, should the architecture be changed or extended in the future.


[EDIT]
ChopFile.php
PHP
<?php

    $fileName = 'Baltimore.csv';
    $chunkSize = 2048;

    $inputData = file_get_contents($fileName);
    $inputLength = strlen($inputData);
    $remaining = $inputLength;
    $numFiles = floor($inputLength / $chunkSize) + 1;

    // 65 is the charode for 'A'
    $lastLetterInName = 65+($numFiles-1);

    printf("Input File: %s\n<br>", $fileName);
    printf("File Size: %d\n<br>", $inputLength);
    printf("Chunk Size: %d bytes\n<br>", $chunkSize);
    printf(" Num files: %d.\n<br>", $numFiles);


    $curFileSize = $chunkSize;

    for ($i=0; $i<$numFiles; $i++)
    {
        if ($remaining < $chunkSize)
            $curFileSize = $remaining;

        $curData = substr($inputData, ($i*$chunkSize), $curFileSize);

        $curFileName = sprintf("%s.%c%c", $fileName, 65+$i, $lastLetterInName);
        $curFile = fopen($curFileName, 'wb');
        fwrite($curFile, $curData, $curFileSize);
        fclose($curFile);

        printf("file %2d. - (%d bytes) - %s\n<br>", $i, $curFileSize, $curFileName);

        $remaining -= $chunkSize;
    }
?>


Output:
Input File: Baltimore.csv 
File Size: 20959 
Chunk Size: 2048 bytes 
Num files: 11. 
file 0. - (2048 bytes) - Baltimore.csv.AK 
file 1. - (2048 bytes) - Baltimore.csv.BK 
file 2. - (2048 bytes) - Baltimore.csv.CK 
file 3. - (2048 bytes) - Baltimore.csv.DK 
file 4. - (2048 bytes) - Baltimore.csv.EK 
file 5. - (2048 bytes) - Baltimore.csv.FK 
file 6. - (2048 bytes) - Baltimore.csv.GK 
file 7. - (2048 bytes) - Baltimore.csv.HK 
file 8. - (2048 bytes) - Baltimore.csv.IK 
file 9. - (2048 bytes) - Baltimore.csv.JK 
file 10. - (479 bytes) - Baltimore.csv.KK 


Plus, of course, 11 files in the folder the script resides in.
 
Share this answer
 
v2
Comments
AndyInUK 8-Nov-12 10:54am    
Thanks for your reply.

I don't think, first option is any good in this case as am uploading to external system, and i don't think, they will accept compressed file.

The second suggestion seems appropriate in this situation but am wondering how would you split an array into smaller pieces no larger than 2MB? I know there is a function called array_chunk, but I need to split it by file size and not by the number of elements? Data is basically an array(queried from database)...

Thank you
enhzflep 8-Nov-12 12:03pm    
Pleasure mate. I suspected as much Re: compression.

I'd just use fwrite to write a bunch of temporary files of whatever length was appropriate. If the input data was a single file, I'd just open it then keep spitting out new files until I'd cut the whole thing up into pieces. If on the other hand, it's an array I'd turn it into a string then write that to files, chunk by chunk.

See my updated Solution for a code-snippet and it's output.
AndyInUK 9-Nov-12 6:10am    
Thanks for the excellent solution. I will try this now. :)
AndyInUK 9-Nov-12 10:57am    
hmmm the solution is good for splitting the csv file but the problem is I need header(column names) to be added in each csv split as on the receiving end, it will concatenate the data according to the header.

May be it would be a better option to get the data from mysql, and split it into 2MB size and insert it to csv and repeat till the complete data size.

What do you think?

Thanks
enhzflep 9-Nov-12 18:08pm    
You're welcome, sorry it wasn't quite up to the task.

Ahhh. Yes indeed - never even considered that you'd need each file to be 'stand-alone' as it were.

Yes, I think that it may be better to get 2mb chunks from mySql. Naively, and without knowing anything about the data, I'd look at the maximum length of all the fields in a row, add the appropriate number of bytes for the comma delimiters. If say, for example, there were 10 varchar fields of 256 bytes each, separated by a space and a comma, I'd count each row as adding (10*256) + (9* strlen(", ")) bytes. I'd divide 2MB by this number, then adjust the sql, so that I just selected that many rows at a time with the LIMIT keyword.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900