Quick Fix: "Use the p
flag in sed
to only print the captured groups.
Example:
echo "This is a sample 123 text and some 987 numbers" | sed -En 's/[a-zA-Z ]*([0-9]+)[a-zA-Z ]*([0-9]+)[a-zA-Z ]*/\\1 \\2/p'
This will output:
123 987
```"
}
The Problem:
"Provide a solution using sed that only outputs captured groups.
Given the following input and pattern:
Input:
This is a sample 123 text and some 987 numbers
Pattern:
/([\d]+)/
Expected Output:
123
987
```"
}
The Solutions:
Solution 1: Capturing Groups with sed
To output only captured groups using `sed`, you can combine the following techniques:
- Identify capture groups: Use parentheses within your regular expression to capture the desired groups.
- Exclude unwanted text: Specify what you don’t want to be output using exclusion patterns.
- Print desired output: Use back references to output the captured groups.
Example:
To extract and output only the numbers "123" and "987" from the input:
echo "This is a sample 123 text and some 987 numbers" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p'
Explanation:
-r
enables extended regular expressions.-n
suppresses default line printing.- The regular expression:
- Excludes non-digits (e.g., spaces) before, between, and after each group of digits.
- Captures two groups of digits:
([[:digit:]]+)
.
- The replacement string (
\1 \2
) outputs the captured groups.
Generalizing for multiple matches:
For an unspecified number of matches, you can use grep
with the following options:
-P
: Enables Perl Compatible Regular Expressions.-o
: Outputs only the matched portions (i.e., the captured groups).
Example:
echo "This is a sample 123 text and some 987 numbers" | grep -Po '\d+'
Solution 2: Sed with Escaped Parentheses
To output only captured groups with sed using escaped parentheses, follow these steps:
- Use escaped parentheses: Surround the desired captured group with
\( ... \)
to capture it and store it in a numbered buffer. - Reference captured group: Use
\NUMBER
to refer to the captured group in the replacement pattern, whereNUMBER
is the number of the group (starting from 1). - Replace entire line with captured group: Replace the entire line with the captured group using the replacement pattern
\NUMBER
.
Example:
Input:
This is a sample 123 text and some 987 numbers
Pattern:
/([\d]+)/
Replacement Pattern:
\1
Output:
123
987
In this example, the pattern ([\d]+)
captures the digits in the line, and the replacement pattern \1
references the captured group. The entire line is replaced with the captured group, resulting in the output of only the numbers.
Solution 3: {title}
You can use grep:
grep -Eow “[0-9]+” file
Solution 4: Sed capture groups only
Using `s/…/…/gp` command, you can output only captured groups. This will only print the matched parts, without the original text.
Solution 5: Give up and use Perl
Since sed
does not cut it, let’s just throw the towel and use Perl, at least it is LSB while grep
GNU extensions are not 🙂
-
Print the entire matching part, no matching groups or lookbehind needed:
cat <<EOS | perl -lane 'print m/\d+/g' a1 b2 a34 b56 EOS
Output:
12 3456
-
Single match per line, often structured data fields:
cat <<EOS | perl -lape 's/.*?a(\d+).*/$1/g' a1 b2 a34 b56 EOS
Output:
1 34
With lookbehind:
cat <<EOS | perl -lane 'print m/(?<=a)(\d+)/' a1 b2 a34 b56 EOS
-
Multiple fields:
cat <<EOS | perl -lape 's/.*?a(\d+).*?b(\d+).*/$1 $2/g' a1 c0 b2 c0 a34 c0 b56 c0 EOS
Output:
1 2 34 56
-
Multiple matches per line, often unstructured data:
cat <<EOS | perl -lape 's/.*?a(\d+)|.*/$1 /g' a1 b2 a34 b56 a78 b90 EOS
Output:
1 34 78
With lookbehind:
cat EOS << | perl -lane 'print m/(?<=a)(\d+)/g' a1 b2 a34 b56 a78 b90 EOS
Output:
1 3478