I have a CSV file like this
>cat test.in
1|aaa|bbb
1|ccc|ddd
2|xxx|yyy
2|www|zzz
2|www|ttt
and I want to split it in separate files:
- the naming convention should be
prefix.FIELD1.FIELD2.out - FIELD1 should not be in the output file
- every output file should have a header
Is there a neat way to do it in one go with awk?
So far I've managed to have awk create the output files but I can't make it add the header, so I just loop over the output files and add it afterwards
>cat script.sh
#!/bin/bash
FIELD_SEPARATOR="|"
OUTPUT_HEADER="Key|Value"
awk '{FS=OFS="'${FIELD_SEPARATOR}'"; print $2,$3> "prefix." $1 "." $2 ".out"}' test.in
# add the header to all the output files
echo $OUTPUT_HEADER > header
for filename in $(ls prefix.*.out 2>/dev/null); do
cat header $filename > $filename.tmp && mv $filename.tmp $filename
done
rm header
which gives the expected output
>ls prefix.*.out
prefix.1.aaa.out prefix.1.ccc.out prefix.2.www.out prefix.2.xxx.out
>cat prefix.1.aaa.out
Key|Value
aaa|bbb
>cat prefix.1.ccc.out
Key|Value
ccc|ddd
>cat prefix.2.www.out
Key|Value
www|zzz
www|ttt
>cat prefix.2.xxx.out
Key|Value
xxx|yyy
A simple way to do this in
awkis keep an array of the filenames created. If the filename isn't already in the array, output the header and then append your field-2 and field-3 as contents. A check that the number of fields is 3 helps ignore blank lines, etc.You can write your script as:
Or if you like long 1-liners:
Example Use/Output
Result:
with, e.g.
A single
awkcommand is all you need. Let me know if you have questions.Alternative for Ancient Sun SparcStation without
indx in arrRunning SunOS (5.10)Per your comment about the Old Sun SparcStation not implementing
indx in arrayin theawkthat is on the machine, an alternative is to build an indexed array and then scan the array to determine whether that output filename already exists and set a flag to indicate the status. This will be slightly less efficient than theindx in arrayapproach, but shouldn't be off by much.Since the version of
awkon SunOS (5.10) will not even response toawk --version, there is no telling how old it is. As mentioned, the last time I worked on a Sun SparkStation was somewhere in 1991-1993, great machines for the time before the internet:).Give the following a try:
(operation and output is the same)