dd_code dd_code - 1 year ago 71
C Question

Bash/C - Correct invalid values in a colum in the output of the script with the values from other output/file

(Revisited and reworked post. Thank you, guys, for the clarification.)

Hey Gyus,

I would like to ask you for the help with the problem I have encountered, which course of action is not really clear to me and besides some basic logical base I am struggling with the code implementation.

Specific example - output, showing particular logical volumes

// $1_1 $1_2 $1_3 (for an ease I will call columns like this to distiguish it from the ones below
ALASKA_VOL00009873 offline SB98MENO
FRANCE_ICSSI00964 online FRANCE //instead of SB91VMA3
NZELAND_VOL339643750 frozen NZELAND //instead of S6B1B1AQ
DONKEYKONG_ISCSI002194 offline SB99A95Z

Where the first column is the full name of the logical volume, second is the status (not important for us) and the third one should contain just a server names (in this case FRANCE and NZELAND should not be there, those names and the first part of the volume names are the names of the virtual servers on the "physical" one), therefore should be the ones I have typed after them - fixing these requires a new addition to the script, which takes output of the massive analysis command and format it in this form.

The steps I want would be placed after it - it would be as the addition to it, which would fix the possible issues in the third column (SERVER NAME), when invalid value would appear.

For this matter there would be always the file with the output from other script, where are correct names of the servers in the conjuction with the virtual names.

This file's content would have a form like this

// $2_1 $2_2

Where the first column is the name of the virtual server and the second one of the physical server.

In this case it is new bunch of actions to take - I have a few if any experience with selecting something from the file, which is outside the script, comparing particular values in the particular rows and columns and then to make some substitutions, so I would like to ask you for your help and advice.

I have the idea, how it should work - as my knowledge goes (please ignore no syntax, this is like structurogram)

It would be as an addition in the for loop

for (line counter, after each cycle it will move to the another line in the main output (first one), where)
if (the value in the $1_3 matches the value in any line in the $2_1)
"Value from the same line, but in the column $2_2" = "Actual scanned (by for) value in the particular line in the $1_3" // basically rewritting the incorrect value with the correct - there is the first problem I have - how to address all there different positions in the file, such as the value in the second column in the same row, where we found a match ????
continue (or break ???) //break the loop entirely

At the end, wanted output should be like

ALASKA_VOL00009873 offline SB98MENO
FRANCE_ICSSI00964 online SB91VMA3
NZELAND_VOL339643750 frozen S6B1B1AQ
DONKEYKONG_ISCSI002194 offline SB99A95Z

...I am sorry for the not ideal form of the original post, it was my first post.

Thank you again. :)



To give you an correct example of my output...so with my previous one

join -a1 -1 4 -2 1 <(echo "$VAR3") <(echo "$VAR2") | awk '{print $2" "$3" "$4" - "$5}'

I have received

$2 $3 $4 $5 <--this is not in output, it is just for simple orientation, which column is which
APPLE_ISCSI01 offline aggrB2 - EELN1723
GRAPEFRUIT_ISCSI13 offline aggr1 -
GRAPEFRUIT_ISCSI04 offline aggr1 - XX643863
WOLFVILLE_ISCSI48 offline aggr1 - A7S5D1DCY0
WOLFVILLE_ISCSI49 offline aggr1 - A7S5D1DCY0
WOLFVILLE_ISCSI50 offline aggr1 - A7S5D1DCY0
WOLFVILLE_ISCSI51 offline aggr1 - A7S5D1DCY0

The problem is that blank space at the $5 in the second row, actually

There are no tabs as well, I have just added them for it being more clear

Without awk, there is of course this column - which is the first and skipped when formatting


Basically, now I need just to replace the blank space in final output with this one XX643863 (second row).
Variables in the command are just those text files (sorted as well), but as I have mentioned, I wanted to avoid making and then removing the files.

I have tried your awk's - unfortunately no success. Is it possible with awk...or sed ?

Answer Source

Process which you described is a left join between two tables on a common key. In SQL, this could be expressed by query:

SELECT volumes.volume_name, volumes.volume_state, 
COALESCE(mapping.real_name, volumes.alias_name) 
FROM volumes LEFT JOIN mapping on volumes.alias_name = mapping.alias_name

actually, one of possible solutions of this for you is to load it in sqlite and perform that query to obtain result given you can obtain data in csv/tsv (tabs) format:

create table volumes(volume_name varchar, status varchar, alias_name varchar);
create table mapping(alias_name varchar, real_name varchar);

.mode csv

.import volumes.csv volumes
.import mapping.csv mapping

.output result.csv

SELECT volumes.volume_name, volumes.status, 
COALESCE(mapping.real_name, volumes.alias_name) 
FROM volumes LEFT JOIN mapping on volumes.alias_name = mapping.alias_name;

This can then be executed as a script using command cat script.sql|sqlite3.

Alternatives are: use join command together with sort and awk or writing a left join manually in bash or other language (awk or python).

Here is solution using sort, join and awk:

sort -k1 mapping > sorted_mapping
sort -k3 volumes > sorted_volumes

join -a1 -1 3 -2 1 sorted_volumes sorted_mapping

will give almost what you wanted:

NZELAND NZELAND_VOL339643750 frozen S6B1B1AQ
SB98MENO ALASKA_VOL00009873 offline
SB99A95Z DONKEYKONG_ISCSI002194 offline

awk can be used to format it properly:

join -a1 -1 3 -2 1 sorted_volumes sorted_mapping|awk '{print $2, $3, NF==3 ? $1 : $NF}'

This solution do so called sort-merge join.

First we sort both inputs using sort command. Then we use join command to match lines with equal keys (match 3rd column from volumes with 1st column from mapping). join outputs the key as first column so we had to manually re-arrange columns using awk to match desired output format. There is also condition (ternary operator) which detects if there was a match or not between volumes and mapping by checking columns count. If there was a match, we take column from mapping (last one), otherwise we take key (1st column).


I don't recommend it because it's dog sloooo-o-o-o-o-o-o-ow and is essentially re-invention of bicycle (doing manually what is already done by join command), but here is bash version of nested loop left join

Previously here was naive implementation of nested loop join in bash, but it was so sl-o-o-o-o-o-o-o-o-ow that I decided to replace it with this evidence of it's slowness instead:

$ wc -l volumes2.txt             
100 volumes2.txt
$ wc -l mapping2.txt             
100 mapping2.txt
$ time ./nested_loops.sh > output                              

./nested_loops.sh > output  8.01s user 28.79s system 121% cpu 30.276 total

Just don't do nested loop join in pure bash.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download