Found an interesting interaction between pkill and ssh. Documenting it here for posterity:
$ ssh user@remote 'false'; echo $?
1
$ ssh user@remote 'false || echo "failed"'; echo $?
failed
0
$ ssh user@remote 'pkill -f "fake_process"'; echo $?
1
$ ssh user@remote 'pkill -f "fake_process" || echo "failed"'; echo $?
255
It seems like example #4 should have the same output as #2; both false and pkill -f "fake_process" exit with code 1 and have no output. However, #4 will always exit with code 255, even if the remote command explicitly calls exit 0. The docs for ssh state that code 255 just means "an error occurred" (super helpful).
Replacing the pkill command with (exit 1), ls fake_file, kill <non-existent PID>, etc. all work as expected. Additionally, when running locally (not through ssh), these match as expected.
The problem appears to be that
pkillis killing itself. Or rather, it is killing the shell that owns it.First of all, it appears that
sshuses the remote user's shell to execute certain "complicated" commands:Second, it appears that
pkill -fnormally knows not to kill itself (otherwise allpkill -fcommands would suicide). But if run from a subshell, that logic fails:In my case, to fix this I just re-worked some of the code around my
ssh/pkillso that I could avoid having a "complicated" remote command. Theoretically I think you could also do something likepgrep -f <cmd> | grep -v $$ | xargs kill.