How to count comments using nom parser?

151 views Asked by At

I'm trying to wrap my head around the nom package. The simple problem I'm trying to solve is to write a parser that can count the comment lines in a file. I've two types of comments to parse for:

  1. Single line comments using //
  2. Multi-line comments using /* ... */

Here is the code I have so far:

use nom::{
  Err, IResult, Parser,
  branch::alt,
  bytes::complete::{is_not, tag, take_until},
  character::complete::{char, line_ending},
  combinator::{value, eof, map, not, success, all_consuming},
  error::{ErrorKind, ParseError},
  multi::many0,
  sequence::{pair, tuple, preceded, delimited, terminated},
};
// matches on single line comment
fn single_line_comment(s: &str) -> IResult<&str,&str> {
  preceded( tag("//"),is_not("\n\r"))(s)
}

// returns `1` for single line comment match
pub fn count_single_line_comment(i: &str) -> IResult<&str, usize> {
  value(1,
    single_line_comment
  )(i)
}

// matches on a multi-line comment
fn multi_line_comments(i: &str) -> IResult<&str, &str> {
  delimited(tag("/*"),take_until("*/"),tag("*/"))(i)
}

// returns the line count of a multi-line comment
fn count_multi_line_comments(i: &str) -> IResult<&str, usize> {
  map(multi_line_comments, |s| s.lines().count()) (i) 
}

// helper parser that matches on either of the two comment types
fn _count_comment_lines(s: &str) -> IResult<&str,usize> {
  alt((count_single_line_comment, count_multi_line_comments))(s)
}

// function I would like to write but I can't figure this part out
pub fn count_comment_lines(s: &str) -> IResult<&str, Vec<usize>> {
    many0(
      alt((_count_comment_lines, 
           preceded(
            not(_count_comment_lines),
           _count_comment_lines
          )))
    )(s)
}

My logic for the last parser is we're going to match many times on either a comment block, or something that leads up to a comment block or the rest of the file. But it doesn't work. Trying this on my sample strings doesn't yield the values I want.

static SAMPLE1: &str = "//Hello there!\n//What's going on?";
static SAMPLE2: &str = "/* What's the deal with airline food?\nIt keeps getting worse and worse\nI can't take it anymore!*/";
static SAMPLE3: &str = " //Global Variable\nlet x = 5;\n/*TODO:\n\t// Add the number of cats as a variable\n\t//Shouldn't take too long\n*/";
static SAMPLE4: &str = "//First\n//Second//NotThird\n//Third";

In my head, I want to use take_until but that doesn't take a parser, it takes a tag. I don't see anything like take_until for parsers so I'm thinking I have to rethink the combinator approach generally. What do you suggest?

1

There are 1 answers

0
red-swan On

I figured out an answer:

use std::io;
use nom::{
  IResult,
  branch::alt,
  bytes::complete::{is_not, tag, take_until},
  character::complete::anychar,
  combinator::{value, eof, peek, opt},
  multi::{many0, many_till},
  sequence::{preceded, delimited, terminated},
};


fn get_input() -> String {
  let mut input = String::new();
  io::stdin()
    .read_line(&mut input)
    .expect("Failed to read line");
  input.trim().to_string()
}

// match single comment 
fn parse_single_line_comment(s: &str) -> IResult<&str,usize> {
  value(1,
    preceded( tag("//"),is_not("\n"))
  )(s)
}

// mat
fn parse_multi_line_comments(s: &str) -> IResult<&str, usize> {
  let (a,b) = 
  delimited(
    tag("/*"),
    take_until("*/"),
    tag("*/")
  )(s)?;

  Ok((a,b.lines().count()))
}

fn parse_comment(s: &str) -> IResult<&str, usize> {
  alt((parse_single_line_comment, parse_multi_line_comments))(s)
}

fn parse_end_of_file(s:&str) -> IResult<&str,usize> {
  value(0,eof)(s)
}

fn skip_not_comment(s: &str) -> IResult<&str, usize> {
  let (s_rest, _) = many_till(anychar, 
              peek(alt((parse_comment, parse_end_of_file)))
  )(s)?;
  Ok((s_rest,0))
}



pub fn extract_comments(s: &str) -> IResult<&str,usize> {
  let (tail,_) = opt(skip_not_comment)(s)?;

  let (rest, nums) = 
    many0( 
      terminated(
        parse_comment,
        opt(skip_not_comment)
      )
    )(tail)?;

    Ok((rest, nums.iter().sum()))
}

I still think it can be improved but the following tests are passed:

static SAMPLES: [&str;8] = 
  [
    "No comments here",
    "//Hello there!\n//General Kenobi",
    "/* What's the deal with airline food?\nIt keeps getting worse and worse\nI can't take it anymore!*/",
    " //Global Variable\nlet x = 5;\n/*TODO:\n\t// Add the number of cats as a variable\n\t//Shouldn't take too long\n*/\nlet c = 500;",
    "//First\n//Second//NotThird\n//Third",
    "x = 3*4 /* not 3*5 */",
    "/* foo */ /* unterminated comment",
    ""
  ];

#[test]
pub fn count_all_comments() {
  let tests: Vec<usize> = SAMPLES.iter().map(|x| extract_comments(&x).unwrap().1).collect();
  assert_eq!(tests, vec![0,2,3,4,3,1,1,0])
}

The trick here was to consume input until you find a comment (using peek) and then begin to search for comment, non-comment, until the end.