Regular expression not extracting the matched substring from the actual string

57 views Asked by At

Following is the string from which I need to extract the markdown bulletpoints.

This is a paragraph with bulletpoints. * Bulletpoint1 * Bulletpoint2 * Bulletpoint3 and this is some another text.

I want to extract "* Bulletpoint1 * Bulletpoint2 * Bulletpoint3" as a substring from the actual string. Following is the code to extract the substring.

private List<String> extractMarkdownListUsingRegex(String markdownName) {
    String paragraphText = this.paragraph.getText();
    List<String> markdown = new ArrayList<String>();
    String regex = regexMap.get(markdownName);
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(paragraphText);
    while(matcher.find()) {
        markdown.add(matcher.group());
    }
    return markdown;
}

The regex is as follows:

regexMap.put("bulletpoints", "/\\d\\.\\s+|[a-z]\\)\\s+|(\\s\\*\\s+|[A-Z]\\.\\s+)|[IVX]+\\.\\s+/g");

The above code is extracting

[*, *, *]

instead of

[* Bulletpoint1, * Bulletpoint2, * Bulletpoint3]

Can anyone please guide me where I am going wrong in this regard?

1

There are 1 answers

0
prabu naresh On

The regex pattern you've provided,

/\\d\\.\\s+|[a-z]\\)\\s+|(\\s\\*\\s+|[A-Z]\\.\\s+)|[IVX]+\\.\\s+/g 

appears to be quite complex and may not correctly match the markdown bulletpoints. Let's simplify the pattern to achieve your desired result.

private List<String> extractMarkdownListUsingRegex(String markdownName) {
    String paragraphText = this.paragraph.getText();
    List<String> markdown = new ArrayList<>();
    String regex = "\\*\\s+[^*]+";
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(paragraphText);
    while (matcher.find()) {
        markdown.add(matcher.group());
    }
    return markdown;
}

With this regex pattern, it should correctly match lines that start with an asterisk followed by a space and then capture everything until the next asterisk.