I am creating a function that strips the illegal wildcard patterns from the input string. The ideal solution should use a single regex expression, if at all possible.
The illegal wildcard patterns are: %%
and %_%
. Each instance of those should be replaced with %
.
Here's the rub... I'm trying to perform some fuzz testing by running the function against various inputs to try to make it and break it.
It works for the most part; however, with complicated inputs, it doesn't.
The following inputs should return empty string (not an exhaustive list):
The following inputs should return %
(not an exhaustive list).
There will be cases where there are other characters with the input... like:
I have tried using several different patterns and my tests are failing.
String input = "%_%%%%_%%%_%";
// old method:
public static String ancientMethod1(String input){
if (input == null)
return "";
return input.replaceAll("%_%", "").replaceAll("%%", ""); // Output: ""
}
// Attempt 1:
// Doesn't quite work right.
// "A%%" is returned as "A%%" instead of "A%"
public static String newMethod1(String input) {
String result = input;
while (result.contains("%%") || result.contains("%_%"))
result = result.replaceAll("%%","%").replaceAll("%_%","%");
if (result.equals("%"))
return "";
return input;
}
// Attempt 2:
// Succeeds, but I would like to simplify this:
public static String newMethod2(String input) {
if (input == null)
return "";
String illegalPattern1 = "%%";
String illegalPattern2 = "%_%";
String result = input;
while (result.contains(illegalPattern1) || result.contains(illegalPattern2)) {
result = result.replace(illegalPattern1, "%");
result = result.replace(illegalPattern2, "%");
}
if (result.equals("%") || result.equals("_"))
return "";
return result;
}
Here's a more complete defined example of how I'm using this: https://gist.github.com/sometowngeek/697c839a1bf1c9ee58be283b1396cf2e
This regular expression string matches all your examples:
"%(?:_?%)+"
It matches strings consisting of a '%' character followed by one or more sequences consisting of zero or one '_' character and one '%' character (close to literal translation), which is another way of saying what I did in comments: "a sequence of '%' and '_' characters, beginning and ending with '%', and not containing two consecutive '_' characters".
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments