Regex how to match an optional character – Regex

Photo of author
Written By M Ibrahim
comparison-operators docstring regex

Quick Fix: Enclose the optional character inside square brackets followed by a question mark. For example:[A-Z]?".

The Problem:

You are provided with two strings, where you need to match the optional character in the given strings. The optional character can be any uppercase letter from A-Z, and it may or may not be present in the string. You are also required to match the digits and other uppercase letters surrounding the optional character. Here is the regex you’re using:

/^([0-9]{5})+.? ([A-Z]{1}) +.? +([A-Z]{1})([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3}) +([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})/

Help modify the regex to successfully match the optional character and the rest of the string.

The Solutions:

Solution 1: Use Optional Character Matching

To make the character optional, utilize [A-Z]? instead of ([A-Z]{1}) +.*? +. The ? indicates that the character is optional. {1} is redundant since ? already handles this aspect.

The improved Regex:
^([0-9]{5})+\s+([A-Z]?)\s+([A-Z])([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3})\s+([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})

This regex will match the desired strings, including those with or without the optional character.

Additionally, you can simplify the regex further by replacing [0-9] with \d, which represents digits in most regex dialects:

^(\d{5})+\s+([A-Z]?)\s+([A-Z])(\d{3})(\d{3})([A-Z]{3})([A-Z]{3})\s+([A-Z])\d{3}(\d{4})(\d{2})(\d{2})

Solution 2: Use the ‘?’ Quantifier

To make the character optional, add a `?` quantifier after it. The `?` quantifier means the character is optional and can occur zero or one times. So, the updated regular expression would be:

/^([0-9]{5})+.*? ([A-Z]?)+.*? +([A-Z]{1})([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3}) +([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})/

With this change, the regular expression will match the letter if it’s present; otherwise, it will continue matching the rest of the string.

\n

Solution 3: Make the optional character optional

\n

To match an optional character in a regular expression, you can use the question mark (?) quantifier. This quantifier indicates that the preceding element is optional and can occur zero or one time. In your case, you can make the character ([A-Z]{1}) optional by adding the question mark:

([A-Z]{1})? +.*? +

This will allow the regular expression to match both strings with or without the single letter after the starting 5 digits.

Alternatively, you can make the entire part optional by enclosing it in parentheses and adding the question mark outside the parentheses:

(([A-Z]{1}) +.*? +)?

This will also allow the regular expression to match both strings, but it will also allow the regular expression to match an empty string between the starting 5 digits and the rest of the string.

Here is a breakdown of the regular expression:

  • ^: Start of the string
  • ([0-9]{5}): Match 5 digits
  • +: One or more occurrences of the preceding element
  • .*?: Zero or more occurrences of any character, non-greedily
  • ([A-Z]{1})?: Optional match of a single uppercase letter
  • +: One or more occurrences of a space
  • .*?: Zero or more occurrences of any character, non-greedily
  • ([A-Z]{1}): Match a single uppercase letter
  • ([0-9]{3}): Match 3 digits
  • ([0-9]{3}): Match 3 digits
  • ([A-Z]{3}): Match 3 uppercase letters
  • ([A-Z]{3}): Match 3 uppercase letters
  • +: One or more occurrences of a space
  • ([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2}): Match an uppercase letter followed by 3 digits, 4 digits, 2 digits, and 2 digits
  • $: End of the string

This regular expression should match both of the strings you provided, regardless of whether or not the single letter is present after the starting 5 digits.

Solution 4: Simpler regex for a specific case

In certain cases, you can use a simpler and more efficient regex to match the desired pattern. Consider the following regex:

(.*)\/(([^\?\n\r])*)

To break down the regex:

  • (.*): This part matches anything before the forward slash / character, capturing it as Group 1.

  • /: This part matches the literal forward slash character ‘/’.

  • (([^?\n\r])*): This part matches any character that is not a question mark (?), newline (), or carriage return (). It captures these characters as Group 2.

To see how this works, let’s apply the regex to the input strings you provided:

  • Input: 20000 K Q511195DREWBT E00078748521
  • Match: 20000 K / Q511195DREWBT
  • Group 1: 20000 K
  • Group 2: Q511195DREWBT
  • Input: 30000 K601220PLOPOH Z00054878524
  • Match: 30000 K601220PLOPOH /
  • Group 1: 30000 K601220PLOPOH
  • Group 2: (empty string)

As you can see, this simpler regex successfully matches the desired pattern in both cases, including when the optional character is absent.