Regex result contains extra match/group with just a return

Question

Regex result contains extra match/group with just a return

58 views Asked by Jeanluca Scaljeri At 20 March 2024 at 13:30

I would like to match everything between start and end given the following string:

const test = `this is the start
a
b
c
e
f
end
g
h
`;

I have the following regex

const output = test.match(/start((.|\n)*)end/m);

No, output[0] contains the whole string that matched (with start and end) output[1] is the match (everything between start and end) output[2] is only a return (\n)

DEMO

What I don't understand is where does the second match/group (output2) come from. Amy suggestions?

Original Q&A

There are 4 answers

Andrew Parks On 20 March 2024 at 13:42

As Pointy mentioned, you have two capturing groups.

If you use a named capturing group, this makes the code easier to write and understand:

const test = `this is the start
a
b
c
e
f
end
g
h
`;

console.log(test.match(/start(?<between>(.|\n)*)end/m).groups.between)

Ishan On 20 March 2024 at 17:54

Based on what you have described I would assume you want to match from 'start' to 'end', but your example is not quite doing that. So I am providing two examples you can chose from depending what you might need.

Match from 'start' to first 'end' the ? makes the [\s\S]+ non greedy.

/(?<=start)[\s\S]+?(?=end)/

const test = `this is the start
a
b
c
e
f
end
g
end
h
`;

console.log(test.match(/(?<=start)[\s\S]+?(?=end)/))

Match from 'start' to last 'end' greedy - keep going till last is matched.

/(?<=start)[\s\S]+(?=end)/

const test = `this is the start
a
b
c
e
f
end
g
end
h
`;

console.log(test.match(/(?<=start)[\s\S]+(?=end)/))

I am using [\s\S] which means any Whitespace or Non Whitespace character essentially what you had with (.|\n) but takes care of more scenarios.

I am also using Lookahead and Lookbehind which is now widely supported in JS, so that the match is in 0 position of the result.

The fourth bird On 21 March 2024 at 09:08

Your second capture group matches a newline, because repeating a capture group captures the value of the last iteration.

The last iteration of (.|\n)* is the newline after the f char and before end

Notes

The * is greedy and will first match all characters first, then it will backtrack to the first occurrence of end
You don't need the /m flag for multiline as the pattern has no anchors
If you don't want partial word matches you can use word boundaries like \bstart\b and \bend\b
If you want multiple matches, you can specify the /g for global matches and make the quantifier non greedy [^]*?
The pattern (.|\n)* is very inefficient as you are optionally repeating a single character or a newline, leaving a lot of paths to backtrack into

You can use a single capture group, and in Javascript you can write the pattern as:

start([^]*)end

See a regex demo

const regex = /start([^]*)end/;
const str = `this is the start
a
b
c
e
f
end
g
h`;

const m = str.match(regex);
if (m) {
  console.log(m[1]);
}

**Pointy** · Accepted Answer · 2024-03-20T13:41:05+00:00

This part of your regular expression: ((.|\n)*) creates two capturing groups. The outer group collects all the matched "anything" characters matched by the inner * group. The inner group will contain the last matched single character.

Note that you'd probably be better off with a slightly different regular expression to avoid the odd effect of collecting too many characters in the groups before backtracking takes over.

TechQA.

Regex result contains extra match/group with just a return

There are 4 answers

Related Questions in JAVASCRIPT

Related Questions in REGEX

Related Questions in REGEX-GROUP

Popular Questions

Trending Questions