user3781528 user3781528 - 17 days ago 5
Bash Question

Extracting Excel to text on Linux server

I’ve been struggling with converting *.xls files generated by an instrument to text format on a Linux server. I’m unable to process the files with Spreadsheet::ParseExcel, unless, I manually open them, sign off on the security warning and save them. Otherwise, they are not recognized as Excel (tested with the sample code).

!/usr/bin/perl -w

use strict;
use Spreadsheet::ParseExcel;

my $parser = Spreadsheet::ParseExcel->new();
my $file = "/data/excel/matrix.xls";

my $workbook = $parser->parse($file);

if ( !defined $workbook ) {
print "can't find workbook!!!";
die $parser->error(), ".\n";
}

for my $worksheet ( $workbook->worksheets() ) {

my ( $row_min, $row_max ) = $worksheet->row_range();
my ( $col_min, $col_max ) = $worksheet->col_range();

for my $row ( $row_min .. $row_max ) {
for my $col ( $col_min .. $col_max ) {

my $cell = $worksheet->get_cell( $row, $col );
next unless $cell;

print "Row, Col = ($row, $col)\n";
print "Value = ", $cell->value(), "\n";
print "Unformatted = ", $cell->unformatted(), "\n";
print "\n";
}
}
}


I’ve tried changing the extension to *.prn and it lets me open the files manually without a warning but they are not recognized by the Spreadsheet::ParseExcel either.

The files contain 8 columns of data on the first sheet only. I would like to convert them to text files and used them to look up values in my Perl script. Here is some sample data in excel:

Gene Target Barcode1 Barcode2 Barcode3 Barcode4 Barcode5 Barcode6
MOTOR MOTOR_1 343 453 432 345 543 342
MYCN MYCN_2 342 98 87 876 54 765


My last option is to use VBA but I would reader stick to Perl/Shell code if possible. Is there a straightforward solution to this problem?

Thank you,

Answer

It's not particularly elegant, but you might try using the Linux command "strings" to extract just the printable characters from your spreadsheet file first. Then you could parse the output until you see the column headings, and the data should be after that.

Comments