Java regular expression for masked alphanumeric codes

86 views Asked by At

I need a help to use or build a regular expression to mask alphanumeric with *.

I tried it with this expression, but it doesn't work correctly when it has zeros in the middle of the string:

(?<=[^0].{3})\w+(?=\w{4})

Live samples: https://www.regexplanet.com/share/index.html?share=yyyyf47wp3r

Input Output
0001113033AA55608981 0001113*********8981
23456237472347823923 2345************3923
00000000090000000000 0000000009000***0000
09008000800060050000 09008***********0000
AAAABBBBCCCCDDDDEEEE AAAA************EEEE
0000BBBBCCCCDDDDEEEE 0000BBBB********EEEE

The rules are:

  1. The first 4 that are not zeros, and the last 4 a must be displayed.
  2. Leading zeros are ignored, but not removed or replaced.
4

There are 4 answers

2
Wiktor Stribiżew On BEST ANSWER

You can capture initial zeros and 3 alhpanumeric chars right after them in one group, the middle part into a second group, and then the last 4 alphanumeric chars into a third group, then only replace each char in the second group.

Here is an example (Java 11 compliant):

String text = "0001113033AA55608981";
Matcher mr = Pattern.compile("^(0*\\w{4})(.*)(\\w{4})$").matcher(text);
text = mr.replaceFirst(m -> m.group(1) + "*".repeat(m.group(2).length()) + m.group(3));
System.out.println(text); // => 0001113**********8981

See the Java demo.

The regex matches

  • ^ - start of string
  • (0*\w{4}) - Group 1: zero or more 0 chars and then any four alphnumeric/underscore chars
  • (.*) - Group 2: any zero or more chars other than line break chars (replace with \w* if you only allow "word" chars)
  • (\w{4}) - Group 3: four "word" chars
  • $ - end of string.

Java 8 compliant version:

String text = "0001113033AA55608981";
Matcher m = Pattern.compile("^(0*\\w{4})(.*)(\\w{4})$").matcher(text);
StringBuffer result = new StringBuffer();
while (m.find()) {
    m.appendReplacement(result, m.group(1) + String.join("", Collections.nCopies(m.group(2).length(), "*")) + m.group(3));
}
m.appendTail(result);
System.out.println(result.toString());

See the Java code demo online.

But you may just use

String text = "0001113033AA55608981";
Matcher m = Pattern.compile("^(0*\\w{4})(.*)(\\w{4})$").matcher(text);
String result = "";
if (m.matches()) {
    result = m.group(1) + String.join("", Collections.nCopies(m.group(2).length(), "*")) + m.group(3);
}
System.out.println(result);
0
logi-kal On

A good regex would be (?<=[^0]\w{3}\w*)\w(?=\w{4}).

Unfortunately, in Java you can't use unbounded lookbehinds, but you can use

(?<=[^0]\w{3,14})\w(?=\w{4})

if you know that your string is at most 20 characters long (14 = 20 minus the first character, minus the one you're matching, minus the last 4 ones).

See a demo here.

Clearly this value can be computed dynamically, e.g.:

for (String text : Arrays.asList("0001113033AA55608981",
                                 "23456237472347823923",
                                 "00000000090000000000",
                                 "09008000800060050000",
                                 "AAAABBBBCCCCDDDDEEEE",
                                 "0000BBBBCCCCDDDDEEEE")) {
    System.out.println(text.replaceAll("(?<=[^0]\\w{3," + (text.length()-6) + "})\\w(?=\\w{4})", "*"));
}

(demo)

0
The fourth bird On

In Java you could use a finite quantifier in a lookbehind assertion.

(?<=^0{0,100}[^\W0]\w{3,100})\w(?=\w{4,}$)

The pattern matches:

  • (?<= Positive lookbehind, assert that to the left is
    • ^0{0,100} Match 0-100 zero's from the start of the string (the 100 is arbitrary)
    • [^\W0] Match a word character other than a zero
    • \w{3,100} Match 3-100 word characters (where the 100 is again arbitrary)
  • ) Close the lookbehind
  • \w Match a single word character
  • (?=\w{4,}$) Positive lookahead, assert 4 or more word characters to the end of the string

See a Java demo and a regex demo

String[] strings = {
    "0001113033AA55608981",
    "23456237472347823923",
    "00000000090000000000",
    "09008000800060050000",
    "AAAABBBBCCCCDDDDEEEE",
    "0000BBBBCCCCDDDDEEEE"
};

for (String s : strings) {
    String regex = "(?<=^0{0,100}[^\\W0]\\w{3,100})\\w(?=\\w{4,}$)";
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(s);
    System.out.println(m.replaceAll("*"));
}

Output

0001113*********8981
2345************3923
0000000009000***0000
09008***********0000
AAAA************EEEE
0000BBBB********EEEE
0
Andrei Odegov On

No calls to the replace function. Only a "pure " regex.

(?:^(0*[^0].{3})|\G).(?=.{4,}$)

The regex can be broken down as follows.

(?:              group, but do not capture:
----------------------------------------------------------
  ^                the beginning of a "line"
----------------------------------------------------------
  (                group and capture to \1:
----------------------------------------------------------
    0*              '0' (0 or more times (matching
                    the most amount possible))
----------------------------------------------------------
    [^0]             any character except '0'
----------------------------------------------------------
    .{3}             any character except \n (3 times)
----------------------------------------------------------
  )                end of \1
----------------------------------------------------------
 |                OR
----------------------------------------------------------
  \G               where the last m//g left off
----------------------------------------------------------
)                end of grouping
----------------------------------------------------------
.                any character except \n
----------------------------------------------------------
(?=              look ahead to see if there is:
----------------------------------------------------------
  .{4,}            any character except \n (at least 4
                   times (matching the most amount
                   possible))
----------------------------------------------------------
  $                before an optional \n, and the end of a
                   "line"
----------------------------------------------------------
)                end of look-ahead

See a Java demo.

import java.util.regex.*;

class Rextester {
  public static void main(String[] asd){
    final String rx = "(?:^(0*[^0].{3})|\\G).(?=.{4,}$)";
    final String text = "0001113033AA55608981\n"
                      + "23456237472347823923\n"
                      + "00000000090000000000\n"
                      + "09008000800060050000\n"
                      + "AAAABBBBCCCCDDDDEEEE\n"
                      + "0000BBBBCCCCDDDDEEEE\n"
                      + "00000000000000000000";
    final String subs = "$1*";
    final Pattern p = Pattern.compile(rx, Pattern.MULTILINE);
    final Matcher m = p.matcher(text);
    final String res = m.replaceAll(subs);
    System.out.println(res);
  }  
}

Output.

0001113*********8981
2345************3923
0000000009000***0000
09008***********0000
AAAA************EEEE
0000BBBB********EEEE
00000000000000000000