replace specific fields in file with fields from other file,fields which needs to be replaced will be found by matching different fields of samefiles

59 views Asked by At

I want to replace specific fields in file (separated by ,) with fields from other file Fields which needs to be replaced will be found after matching different fields of same files. e.g.

cat test1.dat

12345,98765,aaaa,bbbbb

12346,98766,cccc,ddddd



cat test2.dat 
12345,something,something,something,something,something,98765,something,something,somethi
12345,something,something,something,something,something,45655,something,something,somethi
12346,something,something,something,something,something,98766,something,something,somethi
12346,something,something,something,something,something,44556,something,something,something

Output file

cat test3.dat
aaaa,something,something,something,something,something,bbbbb,something,something,somethi
12345,something,something,something,something,something,45655,something,something,somethi
cccc,something,something,something,something,something,ddddd,something,something,somethi
12346,something,something,something,something,something,44556,something,something,somethi

in above example we are checking

if ($1 of test2.dat == $1 of test1.dat && $7 of test2.dat == $2 of test1.dat )
then

($1 of test2.dat = $3 of test1.dat) 

($7 of test2.dat = $4 of test1.dat)

test1.dat will have distinct values of $1 and $2

test2.dat will be a huge file having millions of records

I am looking for solution which is fastest (we can even iterate through test1.dat and check for each row in test3.dat if it is fastest solution)

3

There are 3 answers

0
karakfa On

awk to the rescue!

assumes file1 has unique keys

$ awk 'BEGIN   {FS=OFS=","}       # set delimiters
       NR==FNR {a[$1]=$0; next}   # store file1 in array
       $1 in a {split(a[$1],b);   # if the key exists in file1, set file1 values in b
                if($7==b[2])      # and seventh field matches, replace values; print
                  {$7=b[4]; $1=b[3]}}1' file1 file2

gives this

aaaa,something,something,something,something,something,bbbbb,something,something,somethi
12345,something,something,something,something,something,45655,something,something,somethi
cccc,something,something,something,something,something,ddddd,something,something,somethi
12346,something,something,something,something,something,44556,something,something,something
0
RavinderSingh13 On

Following awk may help you in same.

awk -F, 'FNR==NR{a[$1,$2]=$3;b[$1,$2]=$4;next} (($1,$7) in a){val=$7;$7=b[$1,$7];$1=a[$1,val]} 1' OFS=, test1.dat test2.dat

Output will be as follows.

aaaa,something,something,something,something,something,bbbbb,something,something,somethi
12345,something,something,something,something,something,45655,something,something,somethi
cccc,something,something,something,something,something,ddddd,something,something,somethi
12346,something,something,something,something,something,44556,something,something,something
0
Bach Lien On

Generate an awk script from test1.dat, then run that script against test2.dat. Perhaps it would be faster than running a two-dimentional-check awk. Example, this command:

awk -F, '{print "if(\$1==\""$1"\" && \$7==\""$2"\"){\$1=\""$3"\"; \$7=\""$4"\"};"}' test1.dat

would generate the needed awk code, which looks like:

if($1=="12345" && $7=="98765"){$1="aaaa"; $7="bbbbb"};
if($1=="12346" && $7=="98766"){$1="cccc"; $7="ddddd"};

Test:

$ codes=$(awk -F, '{print "if(\$1==\""$1"\" && \$7==\""$2"\"){\$1=\""$3"\"; \$7=\""$4"\"};"}' test1.dat)
$ echo "$codes" "print;"
if($1=="12345" && $7=="98765"){$1="aaaa"; $7="bbbbb"};
if($1=="12346" && $7=="98766"){$1="cccc"; $7="ddddd"}; print;

$ awk -F, -f <(echo "{$codes" "print;}") test2.dat
aaaa something something something something something bbbbb something something somethi
12345,something,something,something,something,something,45655,something,something,somethi
cccc something something something something something ddddd something something somethi
12346,something,something,something,something,something,44556,something,something,something