I have a pipe delimited list of phrases. I would like to remove sequential duplicates using a regex replace/substitution. For example:
dog|cat|cat woman|cat woman|dog|dog
cat|cat|catman|catman|catman|cat woman|cat woman|dog|dogman|doggy
would be transformed into
dog|cat|cat woman|dog
cat|catman|cat woman|dog|dogman|doggy
I am stuck. So far, I am at
((^|\|)([^\|]+))\1+ with a substitution of $1. But clearly, that does not work, for the output is
dog|cat woman|cat woman|dog
cat|catman|catman|cat woman|dogman|doggy
Thanks for your help
You can set boundaries on the left and the right to prevent partial matches when using the capture group and the backreference.
If a lookbehind assertion is supported:
The pattern matches:
(?<![^|\n])Negative lookbehind, assert that what is directly to the left is not any char except|or a newline([^|\n]+)Capture group 1, match 1 or more times any char except|or a newline to prevent crossing lines(?:\|\1)+Repeat 1 or more times matching|and the backreference to group 1(?![^|\n])Negative lookahead that asserts that what is directly to the right is not any char except|or a newlineRegex demo
In the replacement you can use capture group 1.
Output
With thanks to Casimir et Hippolyte for the great improvement.