Finding the identical lines in one sorted and one unsorted text file, keeping the order in the unsorted file

123 views Asked by At

I have two text files, each containing one word per line. For reference, the first file contains a limited list of unique words and the second file a longer file with many words, many of which are recurring.

My goal is keep all the words in the second file which exist in the first file (the wordlist), and remove all the others. The words in the wordlist are sorted, but the issue I am having is that I want the words in the second file to be in the order they are in, which is unsorted. As far as I know this keeps from using the 'comm' command which seems to require sorted lists to find collisions between the two files.

Is there another utility I can use which allows me to achieve my goals, or is there a way to use comm to actually output the joint words in the order they appear in the second file?

1

There are 1 answers

0
KamilCuk On

Is there another utility I can use which allows me to achieve my goals

Yes, there are endless programming languages and utilities. Typically in linux environment, one would write a awk script.

is there a way to use comm to actually output the joint words in the order they appear in the second file?

No. But there's join, with it: you can number the lines nl -w1 of the file you want to preserve order, then sort on second field, then join -o1.1,1.2 on second field with the sorted other file, then re-sort on line numbers and remove line numbers with cut - because of line numbers, the original line order is preserved.