Regular expression not extracting the matched substring from the actual string

Question

Regular expression not extracting the matched substring from the actual string

57 views Asked by Furqan Ahmed At 29 August 2023 at 08:53

Following is the string from which I need to extract the markdown bulletpoints.

This is a paragraph with bulletpoints. * Bulletpoint1 * Bulletpoint2 * Bulletpoint3 and this is some another text.

I want to extract "* Bulletpoint1 * Bulletpoint2 * Bulletpoint3" as a substring from the actual string. Following is the code to extract the substring.

private List<String> extractMarkdownListUsingRegex(String markdownName) {
    String paragraphText = this.paragraph.getText();
    List<String> markdown = new ArrayList<String>();
    String regex = regexMap.get(markdownName);
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(paragraphText);
    while(matcher.find()) {
        markdown.add(matcher.group());
    }
    return markdown;
}

The regex is as follows:

regexMap.put("bulletpoints", "/\\d\\.\\s+|[a-z]\\)\\s+|(\\s\\*\\s+|[A-Z]\\.\\s+)|[IVX]+\\.\\s+/g");

The above code is extracting

[*, *, *]

instead of

[* Bulletpoint1, * Bulletpoint2, * Bulletpoint3]

Can anyone please guide me where I am going wrong in this regard?

Original Q&A

There are 1 answers

**prabu naresh** · Answer 1 · 2023-08-29T08:57:44+00:00

The regex pattern you've provided,

/\\d\\.\\s+|[a-z]\\)\\s+|(\\s\\*\\s+|[A-Z]\\.\\s+)|[IVX]+\\.\\s+/g

appears to be quite complex and may not correctly match the markdown bulletpoints. Let's simplify the pattern to achieve your desired result.

private List<String> extractMarkdownListUsingRegex(String markdownName) {
    String paragraphText = this.paragraph.getText();
    List<String> markdown = new ArrayList<>();
    String regex = "\\*\\s+[^*]+";
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(paragraphText);
    while (matcher.find()) {
        markdown.add(matcher.group());
    }
    return markdown;
}

With this regex pattern, it should correctly match lines that start with an asterisk followed by a space and then capture everything until the next asterisk.

TechQA.

Regular expression not extracting the matched substring from the actual string

There are 1 answers

Related Questions in JAVA

Related Questions in REGEX

Related Questions in STRING

Related Questions in MARKDOWN

Related Questions in APACHE-POI-4

Popular Questions

Trending Questions