Nextflow script for performing quality control and trimming the sequence

Question

Nextflow script for performing quality control and trimming the sequence

38 views Asked by Tnau At 31 March 2024 at 06:47

I am new to Nextflow scripts. I am trying to build a mitochondrial DNA variant pipeline. I have used fastqc and trimmomatic tool for quality checking and trimming a low quality sequences. I have written a script below, program is executed but shows no output.

#!/usr/bin/env nextflow

params {
  fastq_dir = "/mnt/e/Bioinformatics_ppt_learning/mtDNA/nextflow_scripts/*.fastq.gz"
  fastqc_dir = "/mnt/e/Bioinformatics_ppt_learning/mtDNA/nextflow_scripts/fastqc_report"
  trimmed_dir = "/mnt/e/Bioinformatics_ppt_learning/mtDNA/nextflow_scripts/trimmed_fastq"
  trimmomatic_jar = "/mnt/e/Bioinformatics_ppt_learning/mtDNA/nextflow_scripts/trimmomatic-0.39.jar"
}

process FastQC {
  tag "Running FastQC on ${fastq}"
  publishDir "${fastqc_dir}/${fastq.baseName}"
  input: path fastq
  script:
    """
    fastqc -o ${fastqc_dir} ${fastq}
    """
}

process Trimmomatic {
  tag "Trimming ${fastq.baseName}"
  input:
    path read1 from FastQC.output

  output:
    file(joinPath(trimmed_dir, "${read1.baseName}_trimmed.fastq.gz"))

  script:
    """
    java -jar ${params.trimmomatic_jar} PE -threads 4 \
      ${read1} ${joinPath(trimmed_dir, "${read1.baseName}_trimmed.fastq.gz")} \
      ${joinPath(trimmed_dir, "${read1.baseName}_unpaired.fastq.gz")} \
      ${joinPath(trimmed_dir, "${read1.baseName}_unpaired.fastq.gz")}
    """
}

workflow {
  fastq_files = Channel.fromPath(params.fastq_dir)

  fastq_files.each {
    FastQC(fastq: it)
    Trimmomatic(read1: FastQC.output)
  }
}

Original Q&A

There are 1 answers

**dthorbur** · Answer 1 · 2024-03-31T09:23:03+00:00

publishDir works by emitting items in the process output declaration to the path provided. You haven't provided an output declaration for either process, so it doesn't think there is anything to publish.

Also, unless you're using it for checkpointing, you don't need the output from FastQC for Trimmomatic, you can get the two processes to run in parallel.

Don't use joinPath or any absolute path in your processes. That's not what Nextflow is designed for, and often will lead to errors. Plus, by putting an absolute path in the output declaration, you're telling the process to look in the output directory for the file generated in the process. Use publishDir to emit files.

The file operator is deprecated. Use path instead. The documentation is amazing for nextflow. It's a steep learning curve, but it's very good at describing how things work.

So here is an updated script:

process FastQC {
  tag "Running FastQC on ${sampleid}"

  publishDir {
    path: "${params.fastqc_dir}/${fastq.baseName}",
    move: 'move',
  }

  input: 
    tuple val(sampleid), path(fastq)

  output:
    path("*.html")

  script:
    """
    fastqc ${fastq}
    """
}

process Trimmomatic {
  tag "Trimming ${sampleid}"

  publishDir {
    path: "${params.trimmed_dir}",
    move: 'copy',
  }
  
  input:
    tuple val(sampleid), path(fastq)

  output:
    path("*_trimmed.fastq.gz")

  script:
    """
    java -jar ${params.trimmomatic_jar} PE -threads 4 \
      ${fastq} ${sampleid}_trimmed.fastq.gz")} \
      ${sampleid}_unpaired.fastq.gz")} \
      ${sampleid}_unpaired.fastq.gz")}
    """
}

In the workflow, you shouldn't need to tell the processes to iterate over each element. This is the default behaviour of the tool. I've added some commands to the channel generation to highlight some redundancy you can add.

Channel
  .fromPath(${params.fastq_dir}/*{.fastq.gz,.fq.gz,.fastq,.fq})
  .map { it -> tuple( it.simpleName, it ) }
  .ifEmpty { error "Cannot find any fastq files in ${params.fastq_dir}" }
  .set { fastq_files }

workflow {
  FastQC(fastq_files)
  Trimmomatic(fastq_files)
}

EDIT: Missed some of the absolute paths. Updated input to be a tuple instead since it's better at handing names this way and adjusted tags.

TechQA.

Nextflow script for performing quality control and trimming the sequence

There are 1 answers

Related Questions in NEXTFLOW

Related Questions in FASTQ

Popular Questions

Trending Questions