Patrick B. Patrick B. - 3 months ago 7
Python Question

How to Import multiple CSV files then make a Master Table?

I am a research chemist and have carried out a measurement where I record 'signal intensity' vs 'mass-to-charge (m/z)' . I have repeated this experiment 15x, by changing a specific parameter (Collision Energy). As a result, I have 15 CSV files and would like to align/join them within the same range of m/z values and same interval values. Due to the instrument thresholding rules, certain m/z values were not recorded, thus I have files that cannot simply be exported into excel and copy/pasted. The data looks a bit like the tables posted below

Dataset 1: x | y Dataset 2: x | y
--------- ---------
0.0 5 0.0 2
0.5 3 0.5 6
2.0 7 1.0 9
3.0 1 2.5 1
3.0 4

Using matlab I started with this code:

%% Create a table for the set m/z range with an interval of 0.1 Da
mzrange = 50:0.1:620;
mzrange = mzrange';
mzrange = array2table(mzrange,'VariableNames',{'XThompsons'});

Then I manually imported 1 X/Y CSV (Xtitle=XThompson, Ytitle=YCounts) to align with the specified m/z range.

%% Join/merge the two tables using a common Key variable 'XThompson' (m/z value)
mzspectrum = outerjoin(mzrange,ReserpineCE00,'MergeKeys',true);

% Replace all NaN values with zero
mzspectrum.YCounts(isnan(mzspectrum.YCounts)) = 0;

At this point I am stuck because repeating this process with a separate file will overwrite my YCounts column. The title of the YCounts column doesnt matter to me as I can change it later, however I would like to have the table continue as such:

XThompson | YCounts_1 | YCounts_2 | YCounts_3 | etc...

How can I carry this out in Matlab so that this is at least semi-automated? I've had posted earlier describing a similar scenario but it turned out that it could not carry out what I need. I must admit that my mind is not of a programmer so I have been struggling with this problem quite a bit.

PS- Is this problem best executed in Matlab or Python?


I managed to create a solution to my problem based on learning through everyone's input and taking an online matlab courses. I am not a natural coder so my script is not as elegant as the geniuses here, but hopefully it is clear enough for other non-programming scientists to use.

Here's the result that works for me:

% Reads a directory containing *.csv files and corrects the x-axis to an evenly spaced (0.1 unit) interval.

% Create a matrix with the input x range then convert it to a table
prompt = 'Input recorded min/max data range separated by space \n(ex. 1 to 100 = 1 100): ';
inputrange = input(prompt,'s');
min_max = str2num(inputrange)
datarange = (min_max(1):0.1:min_max(2))';
datarange = array2table(datarange,'VariableNames',{'XAxis'});

files = dir('*.csv');
for q=1:length(files);

    % Extract each XY pair from the csvread cell and convert it to an array, then back to a table.
    data{q} = csvread(files(q).name,2,1); 
    data1 = data(q);
    data2 = cell2mat(data1);
    data3 = array2table(data2,'VariableNames',{'XAxis','YAxis'});

    % Join the datarange table and the intensity table to obtain an evenly spaced m/z range
    data3 = outerjoin(datarange,data3,'MergeKeys',true);
    data3.YAxis(isnan(data3.YAxis)) = 0;
    data3.XAxis = round(data3.XAxis,1);

    % Remove duplicate values
    data4 = sortrows(data3,[1 -2]);
    [~, idx] = unique(data4.XAxis);
    data4 = data4(idx,:);

    % Save the file as the same name in CSV without underscores or dashes
    filename = files(q).name;
    filename = strrep(filename,'_','');
    filename = strrep(filename,'-','');
    filename = strrep(filename,'.csv','');
    clear data data1 data2 data3 data4 filename