Vlad Vlad - 3 months ago 14
Bash Question

Intricate file/text manipulation

I have some rather tricky file manipulation I need to perform, but I am rather bad with coding and I am immediately stumped with where to even begin. Any help would be amazing, so thanks in advance. Preferably in Shell or Python as they are the languages I have cursory knowledge of, however if there is an easy solution in another language I am open to it.




I have 2 massive files with information that correlates, however they are not correctly aligned with their columns which makes it difficult to match the data. To complicate things even further, they have varying values after the decimal point, even though it is only the values before the decimal point which are of interest.

So what I need to do:


  • Read
    file1
    ,
    column1
    ,
    row1
    , but ignore all values after the decimal point.

  • Read
    file2
    and search
    column1
    for the value taken from
    file1
    while again ignoring what comes after the decimal point.

  • Once the correlative value is found in
    file2
    , output both of these lines into a new file (
    file3
    ) with the rest of the data from their respective lines.



That is step one, and if anyone could help me get there I'd be greatly appreciative. The next step is to apply a loop to this process so that it moves onto
file1, line2
and repeats the process.

Answer

You will need to learn Python better than you know it now. Here is an outline what you will need to do. It is very typical for this kind of "file manipulation".

  1. Make a regular expression that will match lines from file1 and file2 (or two regular expression if the files do not have the same format). Include in your regex, syntax to capture groups that are important to you.
  2. Read file1 line by line.
  3. As each line is read, match it with your regex, find groups that are important, and store them in a hash.
  4. Now read file2 line by line.
  5. As each line is read, match it with your regex, find groups that are important, and search the hash for a match.
  6. When you find the match, output to file3
  7. Go back to 4 and repeat.