From Excel XLS to CSV with PHP-ExcelReader


I just had to convert a lot of xls files to csv, too many to even contemplating doing it manually. I started looking for a solution and found a positive evaluation of PHP-ExcelReader.

And Panayotis was right, it does kick ass but I did experience some problems, the first was pretty silly, the example file could not have been upgraded to reflect where the library files currently are located so I got a lot of inclusion errors. However after fixing the paths the example.php file executed just fine.

The second issue was more insidious, I almost didn’t notice it. When parsing dates the library was one day off, one day too much on every date to be exact. That had to be taken care of. Note that my input format was Y-m-d in the Excel file, you might not get this problem if your format is different.

The code:

require_once 'Excel/reader.php';
error_reporting(E_ALL ^ E_NOTICE);

$handle = fopen("out.csv", "w+");

$split_stamp1 = strtotime('1991-01-02');
$split_stamp2 = strtotime('2003-04-24');

function stockSplit(&$item, $multi){
	foreach(array(2,3,4,5,6,7,9) as $field){
		if(is_numeric($item[$field] + 0))
			$item[$field] /= $multi;
	}
	if(is_numeric($item[10]))
		$item[10] *= $multi;
}

for($year = 1975; $year < 2009; $year++){
	$data = new Spreadsheet_Excel_Reader();
	$data->setOutputEncoding('ASCII');
	$data->read("SCC/$year.xls");
	$items = $data->sheets[0]['cells'];
	foreach($items as $item){
		if(preg_match('/^\d\d\/\d\d\/\d\d\d\d$/', $item[1]) && $item[10] != '-'){
			$arr 		= explode('/', $item[1]);
			$stamp 		= mktime(0, 0, 0, $arr[1], $arr[0], $arr[2]) - 86400;
			if($stamp < $split_stamp1)
				stockSplit($item, 100);
			else if($stamp < $split_stamp2 && $stamp > $split_stamp1)
				stockSplit($item, 10);
			$item[1] 	= date('Y-m-d', $stamp);
			fwrite($handle, implode(',', $item)."\n");
		}
	}
}
fclose($handle);

We’re working with historical trading data here, you know open, close, low, high, volume etc.

We first locate where the relevant data is ($data->sheets[0][‘cells’]), we then loop through each item, the array contains some meta data that we are not interested in, therefore we need to check each entry for a date as the first field. If we have a date there, and the field at position 10 is not -, we continue. Some dates did not contain information (the exchange probably lost it somehow, the “holes” only occurred in the early dates), if they didn’t the volume would have a dash instead of a number.

Next comes the date fix, we convert the date with mktime and subsctract 86400 seconds which is the amount of seconds we need to deduct to get to the day before. I would’ve used strtotime here if I could but it wasn’t able to manage dd/mm/yyyy properly, bummer.

The next if else clauses simply have to do with handling two stock splits correctly, it doesn’t really have anything to do with the Excel stuff.

Finally we re-save the correct date in Y-m-d format.

Note that PHP-ExcelReader has converted all the commas in numbers to dots already (common in Excel), that’s why we can implode with ‘,’.

Related Posts

Tags: , , , , ,