Java Regular Expressions – Learn its Classes and Interface with Coding Examples
We have already discussed the Strings in Java in our article of Java Strings. We have already worked with Strings and also performed operations and manipulations on Strings. But Strings are not limited to this.
We can also perform more operations on Strings such as searching, manipulating, pattern matching and editing a text. In this article, we will discuss the Regular Expressions in Java which are used to define search patterns in a String.
Java Regular Expressions
A regular expression is a group of characters that helps in matching the patterns in a String or a set of Strings, using a particular syntax of a pattern.
Java provides Regular Expressions that are useful for defining patterns in a String which can ultimately be helpful in performing various operations on String/text like searching, processing, editing, pattern matching, manipulating, email and password validation, etc.
A regular expression is not language-specific but they slightly differ for each language. The regular expression in Java and Perl language are almost similar to each other and are very easy to learn.A regular expression is also known as Regex in short.
In Java, Regular Expressions are provided under the package java.util.regex, which is the part of standard Java (Java SE) since Java 1.4. The Java Regex is an API (Application Programming Interface) used to define a pattern for manipulating or searching Strings.
The package java.util.regex provides three classes and one interface for applying regular expressions:
Metacharacters of Java Regular Expressions
The Meta characters used in the regular expressions are:
Meta Character | Description |
. | Any character (may or may not match terminator) |
\d | Any digits – [ 0-9 ] |
\D | Any non-digit – [ ^0-9 ] (except 0 – 9) |
\s | Any whitespace character – [ \t \n \f \r \x0B ] |
\S | Any non-whitespace character – [ ^\s ] |
\w | Any word character – [ a-z A-Z _0-9 ] |
\W | Any non-word character – [ ^\w ] |
\b | A word boundary |
\B | A non-word boundary |
The three classes in Java Regex are:
Class | Description |
util.regex.Pattern | Used to create or define patterns/regular expressions |
util.regex.Matcher | Used to interpret the pattern and performs match operations against an input string. |
util.regex.PatternSyntaxException | Used to throw an exception if the syntax of a regular expression is incorrect. |
And there is an interface:
Interface | Description |
MatchResult interface | Used to find the result of a match operation for a regular expression |
We will discuss each of the classes and interface in detail along with their methods and syntax.
Classes in Java Regular Expressions
1. java.util.Pattern class
The Pattern class is used to define or create regular expressions or patterns. This class is a compiled representation of regular expressions that can be used to define various types of patterns. There is no public constructor in Pattern class.
We can use the public static method compile() of this class by passing regular expression as an argument which will create the pattern object after execution.
Methods of Pattern class
1.1. static Pattern compile(String regex):
This method compiles the specified regular expression into a pattern.
1.2. static Pattern compile(String regex, int flags):
This method is similar to the above method but takes one more argument called flag and is used to compile the given regular expression into a pattern with the given flags.
1.3. int flags():
This method has no parameters and returns the match flags of a pattern.
1.4. Matcher matcher(CharSequence input):
It creates a matcher that will match the given input against this pattern.
1.5. static boolean matches(String regex, CharSequence input):
It is used to compile the given regular expression to match the given input String against it.
1.6. String pattern():
This method is used to return the regular expression from which we compiled this pattern.
1.7. static String quote(String s):
It is used to return a literal pattern String for the stated/input String.
1.8. String[ ] split(CharSequence input):
It splits the given input sequence around matches of this pattern.
1.9. String[ ] split(CharSequence input, int limit):
It is used to split the specified input sequence around matches of this pattern within a given limit.
1.10. String toString():
It is used to return the pattern in string representation.
Code to understand the Pattern class and its methods:
package com.techvidvan.regularexpressions; import java.util.regex.*; public class PatternClassDemo { public static void main(String args[]) { //Using compile() matches() and matcher() methods boolean match1=Pattern.compile("v.d").matcher("vid").matches(); // . represents a single character System.out.println(match1); //Using boolean matches method boolean match2 = Pattern.matches("Te..", "Tech"); // .. represents 2 characters System.out.println(match2); // text "Java" match pattern "Ja.." System.out.println (Pattern.matches("Ja..", "Java")); // text "TechVid" doesn't match pattern "TechV." System.out.println (Pattern.matches("TechV.", "TechVid")); String str = "bbb"; System.out.println("Using the String matches method: "+str.matches(".bb")); System.out.println("Using Pattern matches method: "+Pattern.matches("b.b", str)); } }
Output:
true
true
false
Using the String matches method: true
Using Pattern matches method: true
2. java.util.Matcher class
The object of Matcher class is an engine which is used to perform match operations of a given regular expression against an input string for multiple times. It finds for multiple occurrences of the regular expressions in the input text/string.
Like the Pattern class, Matcher too has no public constructors. You can obtain an object of Matcher class from any object of Pattern class by invoking the matcher() method.
Methods of Pattern class
2.1. int start():
It is used to get the start index of the last character which is matched using find() method.
2.2. int end():
It is used to get the end index of the last character which is matched using find() method.
2.3. boolean find():
It is used to find multiple occurrences of the input sequence that matches the pattern.
2.4. boolean find(int start):
It attempts to find the occurrences of the input sequence that matches the pattern, starting at the specified index.
2.5. String group():
This method returns the input subsequence matched by the previous match.
2.6. int groupCount():
It is used to return the total number of matched subsequence in this matcher’s pattern.
2.7. boolean matches():
It attempts to match the entire text against the pattern.
2.8. String replaceFirst(String Replacement):
Replaces the first subsequence of the input sequence that matches the pattern with the specified replacement string.
2.9. String replaceAll(String Replacement):
Replaces every subsequence of the input sequence that matches the pattern with the specified replacement string.
Code to understand the Matcher class and its methods:
package com.techvidvan.regularexpressions; import java.util.regex.*; public class MatcherClassDemo { public static void main(String args[]) { //Case Sensitive Searching // Creating a pattern "Tech" to be searched Pattern pattern = Pattern.compile("Tech"); // Searching above pattern in "TechJavaTechVidvan" Matcher match = pattern.matcher("TechJavatechVidvan"); // Printing start and end indexes of the pattern in text System.out.println("Case Sensitive Searching:"); while (match.find()) System.out.println("Pattern found from " + match.start() + " to " + (match.end()-1)); //Case Insensitive Searching Pattern pattern1= Pattern.compile("te*", Pattern.CASE_INSENSITIVE); // Searching above pattern in "TechJavaTechVidvan" Matcher match1 = pattern1.matcher("TechJavatechVidvan"); System.out.println("\nCase InSensitive Searching:"); // Printing start and end indexes of the pattern in text while (match1.find()) System.out.println("Pattern found from " + match1.start() + " to " + (match1.end()-1)); // Splitting the String String text = "Tech@VidVan#Tutorial&Of%Java"; String delimiter = "\\W"; Pattern pattern2 = Pattern.compile(delimiter, Pattern.CASE_INSENSITIVE); String[] result = pattern2.split(text); System.out.println("\nSplitting the String around special characters:"); for (String temp: result) System.out.println(temp); // Replacing the String System.out.println("\nReplacing the Strings with other String:"); String regex = "Python"; String inputString = "TechVivdan Python Tutorial. " + "It is a Python Tutorial"; String replaceString = "Java"; // get a Pttern object Pattern pattern3 = Pattern.compile(regex); // get a matcher object Matcher m = pattern3.matcher(inputString); System.out.println("Using replaceFirst() Method"); inputString = m.replaceFirst( replaceString); System.out.println(inputString); System.out.println("\nUsing replaceAll() Method"); inputString = m.replaceAll( replaceString); System.out.println(inputString); } }
Output:
Pattern found from 0 to 3Case InSensitive Searching:
Pattern found from 0 to 1
Pattern found from 8 to 9Splitting the String around special characters:
Tech
VidVan
Tutorial
Of
Java
Replacing the Strings with other String:
Using replaceFirst() Method
TechVivdan Java Tutorial. It is a Python Tutorial
Using replaceAll() Method
TechVivdan Java Tutorial. It is a Java Tutorial
3. java.util.PatternSyntaxException class
This class throws an unchecked exception to indicate a syntax error in a regular-expression pattern.
Methods of Pattern class
3.1. String getDescription():
It is used to get the description of the error.
3.2 int getIndex():
It is used to get the index of the error.
3.3 String getMessage():
This method gives a multiple-line string, describing the syntax error along with its index. It also gives the erroneous regular-expression pattern and indicates the index or error within the pattern.
3.4 String getPattern():
It is used to get the erroneous regular-expression pattern.
Interface in Java Regular Expressions
There is an interface provided in java.util.regex package: MatchResult Interface.
MatchResult Interface:
This interface is used to get the result of a match operation against a regular expression. This interface allows for finding match boundaries, groups and group boundaries, but the modification is not allowed through this interface.
Methods of Pattern class
1. int end():
It returns the index after the last character matched.
2. int end(int group):
It returns the offset after the last character of the subsequence captured by the specified group during this match.
3. String group():
This method returns the input subsequence matched by the previous match.
4. String group(int group):
It returns the input subsequence captured by the specified group during the previous match operation.
5. int start():
It returns the start index of the match.
6. int start(int group):
It returns the start index of the subsequence captured by the given group during this match.
Summary
Regular Expressions are very helpful in manipulating and matching the patterns against a String. It helps in the validation and password checking.
In this article, we were able to see how regular expressions assist in pattern matching and performing many operations on the String. We covered its core classes and interfaces along with their methods and Java codes for better understanding.
This article will surely help you to build up your concepts in Regular Expressions.
Thank you for reading our article. If you have any queries, do let us know through the comment box below.