How can I output only captured groups with sed? – Sed

Photo of author
Written By M Ibrahim
closedxml regex

Quick Fix: "Use the p flag in sed to only print the captured groups.

Example:

echo "This is a sample 123 text and some 987 numbers" | sed -En 's/[a-zA-Z ]*([0-9]+)[a-zA-Z ]*([0-9]+)[a-zA-Z ]*/\\1 \\2/p'

This will output:

123 987
```"
}

The Problem:

"Provide a solution using sed that only outputs captured groups.

Given the following input and pattern:

Input:

This is a sample 123 text and some 987 numbers

Pattern:

/([\d]+)/

Expected Output:

123
987
```"
}

The Solutions:

Solution 1: Capturing Groups with sed

To output only captured groups using `sed`, you can combine the following techniques:

  • Identify capture groups: Use parentheses within your regular expression to capture the desired groups.
  • Exclude unwanted text: Specify what you don’t want to be output using exclusion patterns.
  • Print desired output: Use back references to output the captured groups.

Example:

To extract and output only the numbers "123" and "987" from the input:

echo "This is a sample 123 text and some 987 numbers" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p'

Explanation:

  • -r enables extended regular expressions.
  • -n suppresses default line printing.
  • The regular expression:
    • Excludes non-digits (e.g., spaces) before, between, and after each group of digits.
    • Captures two groups of digits: ([[:digit:]]+).
  • The replacement string (\1 \2) outputs the captured groups.

Generalizing for multiple matches:

For an unspecified number of matches, you can use grep with the following options:

  • -P: Enables Perl Compatible Regular Expressions.
  • -o: Outputs only the matched portions (i.e., the captured groups).

Example:

echo "This is a sample 123 text and some 987 numbers" | grep -Po '\d+'

Solution 2: Sed with Escaped Parentheses

To output only captured groups with sed using escaped parentheses, follow these steps:

  1. Use escaped parentheses: Surround the desired captured group with \( ... \) to capture it and store it in a numbered buffer.
  2. Reference captured group: Use \NUMBER to refer to the captured group in the replacement pattern, where NUMBER is the number of the group (starting from 1).
  3. Replace entire line with captured group: Replace the entire line with the captured group using the replacement pattern \NUMBER.

Example:

Input:

This is a sample 123 text and some 987 numbers

Pattern:

/([\d]+)/

Replacement Pattern:

\1

Output:

123
987

In this example, the pattern ([\d]+) captures the digits in the line, and the replacement pattern \1 references the captured group. The entire line is replaced with the captured group, resulting in the output of only the numbers.

Solution 3: {title}

You can use grep:

grep -Eow “[0-9]+” file

Solution 4: Sed capture groups only

Using `s/…/…/gp` command, you can output only captured groups. This will only print the matched parts, without the original text.

Solution 5: Give up and use Perl

Since sed does not cut it, let’s just throw the towel and use Perl, at least it is LSB while grep GNU extensions are not 🙂

  • Print the entire matching part, no matching groups or lookbehind needed:

    cat <<EOS | perl -lane 'print m/\d+/g'
    a1 b2
    a34 b56
    EOS
    

    Output:

    12
    3456
    
  • Single match per line, often structured data fields:

    cat <<EOS | perl -lape 's/.*?a(\d+).*/$1/g'
    a1 b2
    a34 b56
    EOS
    

    Output:

    1
    34
    

    With lookbehind:

    cat <<EOS | perl -lane 'print m/(?&lt;=a)(\d+)/'
    a1 b2
    a34 b56
    EOS
    
  • Multiple fields:

    cat <<EOS | perl -lape 's/.*?a(\d+).*?b(\d+).*/$1 $2/g'
    a1 c0 b2 c0
    a34 c0 b56 c0
    EOS
    

    Output:

    1 2
    34 56
    
  • Multiple matches per line, often unstructured data:

    cat <<EOS | perl -lape 's/.*?a(\d+)|.*/$1 /g'
    a1 b2
    a34 b56 a78 b90
    EOS
    

    Output:

    1
    34 78
    

    With lookbehind:

    cat EOS << | perl -lane 'print m/(?&lt;=a)(\d+)/g'
    a1 b2
    a34 b56 a78 b90
    EOS
    

    Output:

    1
    3478