Welcome to Regular Expression in Java. It’s also called Regex in Java. When I started programming, java regular expression was a nightmare for me. This tutorial is aimed to help you master Regular Expression in Java. I will also come back here to refresh my Java Regex learning.
The regular expression in java defines a pattern for a String. Regular Expression can be used to search, edit or manipulate text. A regular expression is not language specific but they differ slightly for each language. Regular Expression in Java is most similar to Perl. Java Regex classes are present in java.util.regex
package that contains three classes:
Pattern
object is the compiled version of the regular expression. Pattern class doesn’t have any public constructor and we use it’s public static method compile
to create the pattern object by passing regular expression argument.Matcher
is the java regex engine object that matches the input String pattern with the pattern object created. Matcher class doesn’t have any public constructor and we get a Matcher object using pattern object matcher
method that takes the input String as argument. We then use matches
method that returns boolean result based on input String matches the regex pattern or not.PatternSyntaxException
is thrown if the regular expression syntax is not correct.Let’s have a look at Java Regex example program.
package com.journaldev.util;
import java.util.regex.*;
public class PatternExample {
public static void main(String[] args) {
Pattern pattern = Pattern.compile(".xx.");
Matcher matcher = pattern.matcher("MxxY");
System.out.println("Input String matches regex - "+matcher.matches());
// bad regular expression
pattern = Pattern.compile("*xx*");
}
}
When we run this java regex example program, we get below output.
Input String matches regex - true
Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
*xx*
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.sequence(Pattern.java:2090)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at com.journaldev.util.PatternExample.main(PatternExample.java:13)
Since java regular expression revolves around String, String class has been extended in Java 1.4 to provide a matches
method that does regex pattern matching. Internally it uses Pattern
and Matcher
java regex classes to do the processing but obviously it reduces the code lines. Pattern
class also contains matches
method that takes regex and input String as argument and return boolean result after matching them. So below code works fine for matching input String with a regular expression in Java.
String str = "bbb";
System.out.println("Using String matches method: "+str.matches(".bb"));
System.out.println("Using Pattern matches method: "+Pattern.matches(".bb", str));
So if your requirement is just to check if the input String matches with the pattern, you should save time and lines of code by using simple String matches method. You should use Pattern and Matches classes only when you need to manipulate the input String or you need to reuse the pattern. Note that the pattern defined by regex is applied on the String from left to right and once a source character is used in a match, it can’t be reused. For example, regex “121” will match “31212142121” only twice as “_121____121”.
Regular Expression | Description | Example |
---|---|---|
. | Matches any single character | (“…”, “a%”) – true(“…”, “.a”) – true (“…”, “a”) – false |
^aaa | Matches aaa regex at the beginning of the line | (“^a.c.”, “abcd”) – true (“^a”, “ac”) – false |
aaa$ | Matches regex aaa at the end of the line | (“…cd$”, “abcd”) – true(“a$”, “a”) – true (“a$”, “aca”) – false |
[abc] | Can match any of the letter a, b or c. [] are known as character classes. | (“^[abc]d.”, “ad9”) – true(“[ab].d$”, “bad”) – true (“[ab]x”, “cx”) – false |
[abc][12] | Can match a, b or c followed by 1 or 2 | (“[ab][12].”, “a2#”) – true(“[ab]…[12]”, “acd2”) – true (“[ab][12]”, “c2”) – false |
[^abc] | When ^ is the first character in [], it negates the pattern, matches anything except a, b or c | (“[^ab][^12].”, “c3#”) – true(“[^ab]…[^12]”, “xcd3”) – true (“[^ab][^12]”, “c2”) – false |
[a-e1-8] | Matches ranges between a to e or 1 to 8 | (“[a-e1-3].”, “d#”) – true(“[a-e1-3]”, “2”) – true (“[a-e1-3]”, “f2”) – false |
xx | yy | Matches regex xx or yy |
We have some meta characters in Java regex, it’s like shortcodes for common matching patterns.
Regular Expression | Description |
---|---|
\d | Any digits, short of [0-9] |
\D | Any non-digit, short for [^0-9] |
\s | Any whitespace character, short for [\t\n\x0B\f\r] |
\S | Any non-whitespace character, short for [^\s] |
\w | Any word character, short for [a-zA-Z_0-9] |
\W | Any non-word character, short for [^\w] |
\b | A word boundary |
\B | A non word boundary |
There are two ways to use metacharacters as ordinary characters in regular expressions.
Java Regex Quantifiers specify the number of occurrence of a character to match against.
Regular Expression | Description |
---|---|
x? | x occurs once or not at all |
X* | X occurs zero or more times |
X+ | X occurs one or more times |
X{n} | X occurs exactly n times |
X{n,} | X occurs n or more times |
X{n,m} | X occurs at least n times but not more than m times |
Java Regex Quantifiers can be used with character classes and capturing groups also. For example, [abc]+ means - a, b, or c - one or more times. (abc)+ means the group “abc” one more more times. We will discuss about Capturing Group now.
Regular Expression in Java Capturing groups is used to treat multiple characters as a single unit. You can create a group using ()
. The portion of input String that matches the capturing group is saved into memory and can be recalled using Backreference. You can use matcher.groupCount
method to find out the number of capturing groups in a java regex pattern. For example, ((a)(bc)) contains 3 capturing groups - ((a)(bc)), (a) and (bc) . You can use Backreference in the regular expression with a backslash (\) and then the number of the group to be recalled. Capturing groups and Backreferences can be confusing, so let’s understand this with an example.
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2a2")); //true
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2b2")); //false
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B2AB")); //true
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B3AB")); //false
In the first example, at runtime first capturing group is (\w\d) which evaluates to “a2” when matched with the input String “a2a2” and saved in memory. So \1 is referring to “a2” and hence it returns true. Due to the same reason the second statement prints false. Try to understand this scenario for statement 3 and 4 yourself. :) Now we will look at some important methods of Pattern and Matcher classes.
Pattern.CASE_INSENSITIVE
enables case insensitive matching.split(String)
method that is similar to String class split()
method.toString()
method returns the regular expression String from which this pattern was compiled.start()
and end()
index methods that show precisely where the match was found in the input string.replaceAll(String replacement)
and replaceFirst(String replacement)
.Let’s look at these java regex methods in a simple example program.
package com.journaldev.util;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexExamples {
public static void main(String[] args) {
// using pattern with flags
Pattern pattern = Pattern.compile("ab", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher("ABcabdAb");
// using Matcher find(), group(), start() and end() methods
while (matcher.find()) {
System.out.println("Found the text \"" + matcher.group()
+ "\" starting at " + matcher.start()
+ " index and ending at index " + matcher.end());
}
// using Pattern split() method
pattern = Pattern.compile("\\W");
String[] words = pattern.split("one@two#three:four$five");
for (String s : words) {
System.out.println("Split using Pattern.split(): " + s);
}
// using Matcher.replaceFirst() and replaceAll() methods
pattern = Pattern.compile("1*2");
matcher = pattern.matcher("11234512678");
System.out.println("Using replaceAll: " + matcher.replaceAll("_"));
System.out.println("Using replaceFirst: " + matcher.replaceFirst("_"));
}
}
The output of the above java regex example program is.
Found the text "AB" starting at 0 index and ending at index 2
Found the text "ab" starting at 3 index and ending at index 5
Found the text "Ab" starting at 6 index and ending at index 8
Split using Pattern.split(): one
Split using Pattern.split(): two
Split using Pattern.split(): three
Split using Pattern.split(): four
Split using Pattern.split(): five
Using replaceAll: _345_678
Using replaceFirst: _34512678
That’s all for Regular expressions in Java. Java Regex seems hard at first, but if you work with them for some time, it’s easy to learn and use.
You can checkout complete code and more regular expressions examples from our GitHub Repository.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.
sorry number sequence should not be more than 2.
- Shaheed
This tutorial is very helpful. I have one question how to represent a regex of allowing alphanumeric but number sequence should not be more than 3. Ex: Bas12 -> valid Bas123ba-> Invalid
- Shaheed
HI, May I know how to change the string value e.g ‘1000’ to ‘$1,000.00’ or ‘1000%’ or ‘$100’ using regular expression?
- jb
Your all blogs regarding Java are to the point and with very good exceptional cases. I would happy to buy if you will publish java book … or pdf version of your blogs something. with Index and in order for new learners.
- Radhika Patel
I want to select the files of type xlsx and containing numbers 0-3 in there names.For example inp1.xlsx, inp2.xlsx ,etc. Can anyone tell me what RE I should have to write? Thanks!
- Ruby
I understood all except the below statements. Could someone please explain these? System.out.println(Pattern.matches(“(\\w\\d)\\1”, “a2a2”)); //true System.out.println(Pattern.matches(“(\\w\\d)\\1”, “a2b2”)); //false System.out.println(Pattern.matches(“(AB)(B\\d)\\2\\1”, “ABB2B2AB”)); //true System.out.println(Pattern.matches(“(AB)(B\\d)\\2\\1”, “ABB2B3AB”)); //false
- Ram
How to verify that enter amount is in Indian currency format Suppose: User entered amount like 1,999.00 111.00 99.00 9.00 1,11,11,111.00 if user enter any above input then it should be valid. if user entered 111,111,111.00 or other country currency format then it should be invalid how write regex for it.
- Rupesh
Hi everyone, I am new to Java. I need to generate a random regular expression. Then the user-entered string will be checked against the randomly generated regular expression. This is a backward process of what is normally done. What would be the best way to generate a random regex? A sample code and an overview of steps would be very useful. Thanks a lot. - a Java beginner
- Tyler
Hi How can I use boolean operator to search for two words? If I enter - It AND Master in netbean’s command line it should find these two words in file and return the urllist where these two words are found in the file.
- LR
hi all please tell me how we can add optional fields in our particular format. for eg. my number may or may not contain ‘-’.
- rohit