Nextflow filter entire tuple based on one value

23 views Asked by At

Needing some extra guidance on filtering an entire tuple based on one value.

I have a channel of tuples (reads_ch):

[[id:S1, single_end:false], [/home/ubuntu/S1.R1.fastq.gz, /home/ubuntu/S1.R2.fastq.gz], /home/ubuntu/S1.txt, [PASSED: File S1_R1 is not corrupt., PASSED: File S1_R2 is not corrupt.]]
[[id:S2, single_end:false], [/home/ubuntu/S2.fastq.gz, /home/ubuntu/S2.R2.fastq.gz], /home/ubuntu/S2.txt, [PASSED: File S2_R1 is not corrupt., PASSED: File S2_R2 is not corrupt.]]
[[id:S3, single_end:false], [/home/ubuntu/S3.R1.fastq.gz, /home/ubuntu/S3.R2.fastq.gz], /home/ubuntu/S3.txt, [FAILED, FAILED]]

I am trying to filter this for any sample that did not 'fail'. I've gotten as far as clearing it[3] but haven't figured out how to remove that sample. while keeping the structure of the remaining samples (S1,S2)

Goal:

[[id:S1, single_end:false], [/home/ubuntu/S1.R1.fastq.gz, /home/ubuntu/S1.R2.fastq.gz], /home/ubuntu/S1.txt, [PASSED: File S1_R1 is not corrupt., PASSED: File S1_R2 is not corrupt.]]
[[id:S2, single_end:false], [/home/ubuntu/S2.fastq.gz, /home/ubuntu/S2.R2.fastq.gz], /home/ubuntu/S2.txt, [PASSED: File S2_R1 is not corrupt., PASSED: File S2_R2 is not corrupt.]]

Attempt:

passing = read_ch.map { meta, reads, outcome_file, outcome_status ->
tuple(meta, reads, outcome_file, outcome_status.findAll { !it.contains('FAILED') })
}
passing.view()

[[id:S1, single_end:false], [/home/ubuntu/S1.R1.fastq.gz, /home/ubuntu/S1.R2.fastq.gz], /home/ubuntu/S1.txt, [PASSED: File S1_R1 is not corrupt., PASSED: File S1_R2 is not corrupt.]]
[[id:S2, single_end:false], [/home/ubuntu/S2.fastq.gz, /home/ubuntu/S2.R2.fastq.gz], /home/ubuntu/S2.txt, [PASSED: File S2_R1 is not corrupt., PASSED: File S2_R2 is not corrupt.]]
[[id:S3, single_end:false], [/home/ubuntu/S3.R1.fastq.gz, /home/ubuntu/S3.R2.fastq.gz], /home/ubuntu/S3.txt, []]
1

There are 1 answers

2
mribeirodantas On BEST ANSWER

The entire channel element is supposed to be removed even if only one sample failed? If that's what you want, you'll find the solution below:

Preparing a fake channel similar to yours:

Channel
  .of([
        [id:'S1', single_end:false],
        [file('/home/ubuntu/S1.R1.fastq.gz'), file('/home/ubuntu/S1.R2.fastq.gz')],
        file('/home/ubuntu/S1.txt'),
        ['PASSED: File S1_R1 is not corrupt.', 'PASSED: File S1_R2 is not corrupt.']
      ],
      [
        [id:'S2', single_end:false],
        [file('/home/ubuntu/S2.fastq.gz'), file('/home/ubuntu/S2.R2.fastq.gz')],
        file('/home/ubuntu/S2.txt'),
        ['PASSED: File S2_R1 is not corrupt.', 'PASSED: File S2_R2 is not corrupt.']
      ],
      [
        [id:'S3', single_end:false],
        [file('/home/ubuntu/S3.R1.fastq.gz'), file('/home/ubuntu/S3.R2.fastq.gz')],
        file('/home/ubuntu/S3.txt'),
        ['FAILED', 'FAILED']
      ])
  .set { my_ch }

Filtering:

my_ch
  .filter { it[3].findAll { !it.contains('FAILED') } }
  .view()

Output:

enter image description here

You can also do it like this:

my_ch
  .filter { it[3][0] != 'FAILED' && it[3][1] != 'FAILED' }
  .view()

which makes it easy to adapt if you want either of one failing, instead of both.