I have a gzipped file that I've split into 3 separate files: xaa, xab, xac. I make a fifo
mkfifo p1
and reassemble the files by reading from it, also calculating a checksum and unzipping the file in a pipe:
cat p1 p1 p1 | tee >(sha1sum > sha1sum_new.txt) | gunzip > output_file.txt
This works just fine if I feed the pipe from another terminal with
cat xaa > p1
cat xab > p1
cat xac > p1
but if I feed the pipe with a single line,
cat xaa > p1; cat xab > p1; cat xac > p1
the receiving pipeline hangs, no checksum is produced, and although an output file is produced, it is truncated - but by an amount smaller than the final file size.
Why is the behavior in the second case different from the first?
                        
Interesting question. As the other answer mentions, you have a race condition - and I am pretty sure of that. In fact, you have a race condition in both cases, but in the former you're just lucky it doesn't happen because maybe your files are small and can be read before you enter the next command line. Allow me to explain.
So, a little bit of background first:
catopens each file you feed it as an argument sequentially, prints it to the output, and then closes the file and moves on to the next file. The exact details of whethercatopens each file sequentially or opens them all first and then writes each file sequentially may vary, but it's not relevant for the discussion. In both cases, you'll have a race conditionopen(2)syscall will block on a FIFO / pipe until the other end is opened. So for example, if processpid1opens the FIFO for reading,open(2)will block until, say,pid2opens the FIFO for writing. In other words, opening a FIFO that has no active readers or writers implicitly synchronizes both processes and guarantees that a process will not read from a pipe that has no writer yet, or that a writer will not write to a pipe that has no reader yet. But as we will see, this will be problematic.What's really happening
When you do this:
Things are really slow, because humans are slow. After you enter the first line,
catopensp1for writing. The othercatis blocked on opening it for reading (or maybe not yet, but let's assume it is). Once bothcatprocesses openp1- one for writing, the other for reading - data starts to flow.And then, before you even have the chance to enter the next command line (
cat xab >p1), the whole file flows through the pipe and everyone is happy - thecatreader process sees an end of file on the pipe, callsclose(2), thecatwriter finishes writing the file, and closesp1. Thecatreader moves on to the next file (which isp1again), opens it, and blocks because no active writers have opened the fifo yet.Then, you, slow human, enter the next command line, which causes another
catwriter process to open the FIFO, which unblocks the othercatthat is waiting to open for reading, and everything happens again. And then again for the third command line.When you put everything in one line in the shell, things happen way too fast.
Let's differentiate the 3
catinvocations. Call itcat1,cat2andcat3:The shell executes each command sequentially, waiting for the previous command to finish before moving to the next one.
However, it might just be the case that
cat1finished writing everything top1and exits, the shell moves on tocat2, which opens the FIFO and starts writing the contents ofp1again, and thecatreader didn't have the chance to finish reading whatcat1wrote in the first place, and now suddenly thecatreader "thinks" it's still reading from the first file (the firstp1), but at some point it starts reading the data thatcat2started pushing into the pipe (as if it was in the firstp1). It has no way of knowing that the first "copy" of the data is over ifcat2is faster and opens the FIFO before thecatreader finishes reading whatcat1wrote.Yes, subtle, but it's exactly what is happening.
Then, of course, input eventually comes to an end, and the
catreader will think that the firstp1is done and moves to the nextp1, opening it and waiting for the next writer to open it. But there will never be a next writer! It blocks forever, and the whole pipeline is stalled forever.How to fix it
The solution in the other answer solves the problem. You mentioned in the comments that it might not be enough for you because you don't control when and how a new writer opens and uses the pipe.
So I suggest this instead:
catstandard input top1in the background:cat >p1 &. When you're done, kill the background job.cat p1 | tee >(sha1sum ...)or using the method proposed in the other answer (tee >(...) <p1). After all, opening a FIFO once should be enough no matter how complex your system is; FIFOs by nature always give you the data in a first in first out fashion.Keep the background
catwriter running as long as you know that there's a chance of new files arriving / new writers opening the FIFO and using it. Don't forget to terminate the background job when you know that input is over.