RH134: Red Hat System Administration II
Efficiently complete system administration tasks by matching text patterns
Regular Expression (Regex) = A sequence of characters that defines a search pattern for matching text
grep / egrepsedawkvimless / moreMatch themselves exactly
Have special meaning
Metacharacters: . ^ $ * + ? { } [ ] \ | ( )
Default for grep, sed
+ ? { } | ( ) are literal\+ \? \{\} \| \(\) for special meaning# BRE - escape for special meaning
grep 'ab\+c' file
grep '\(ab\)\+' file
grep -E, egrep, awk
+ ? { } | ( ) are special# ERE - cleaner syntax
grep -E 'ab+c' file
grep -E '(ab)+' file
💡 Recommendation: Use ERE (grep -E) for cleaner, more readable patterns
. matches any single character (except newline)
# Find any three-letter word starting with 'c' and ending with 't'
grep 'c.t' /usr/share/dict/words
# cat, cot, cut, c@t, etc.
# Match any character between slashes
grep '/./.' /etc/passwd
# Matches paths like /bin/bash
# Be careful - dot is greedy!
echo "192.168.1.1" | grep '192.168.1.1' # Literal (but . matches anything!)
echo "192x168y1z1" | grep '192.168.1.1' # Also matches!
^ matches start of line
$ matches end of line
# Find users with bash shell
grep 'bash$' /etc/passwd
# Find comment lines (starting with #)
grep '^#' /etc/ssh/sshd_config
# Find empty lines
grep '^$' /etc/ssh/sshd_config
# Find lines with ONLY "root"
grep '^root$' /etc/group
\ removes special meaning from the next character
# Match a literal dot (IP address)
grep '192\.168\.1\.1' /etc/hosts
# Match a literal asterisk
grep '\*\*\*' logfile
# Match a dollar sign
grep '\$HOME' script.sh
# Match a caret
grep '\^' file
# Match a backslash itself
grep '\\' /etc/fstab
⚠️ Shell Quoting: Use single quotes to prevent shell expansion!
grep '$HOME' # Regex sees: $HOME (end anchor + HOME)
grep "\$HOME" # Regex sees: $HOME (shell expands \$ to $)
[abc] matches any ONE character from the set
# Match lines containing a vowel
grep '[aeiou]' /etc/passwd
# Match any digit
grep '[0123456789]' file
# Case insensitive matching
grep '[Rr]oot' /etc/passwd
# Match specific characters
grep 'log[0-9]' /var/log/
[a-z] matches any character in the range
| Range | Matches | Example |
|---|---|---|
[a-z] |
Lowercase letters | a, b, c, ... z |
[A-Z] |
Uppercase letters | A, B, C, ... Z |
[0-9] |
Digits | 0, 1, 2, ... 9 |
[a-zA-Z] |
All letters | Any letter |
[a-zA-Z0-9] |
Alphanumeric | Letters and digits |
[0-9a-fA-F] |
Hexadecimal | 0-9, a-f, A-F |
# Find lines starting with uppercase letter
grep '^[A-Z]' /etc/services
[^abc] matches any character NOT in the set
# Find lines NOT starting with # (non-comments)
grep '^[^#]' /etc/ssh/sshd_config
# Find lines containing non-alphanumeric characters
grep '[^a-zA-Z0-9]' passwords.txt
# Find non-printable characters
grep '[^[:print:]]' file
⚠️ Note: ^ means negation only when it's the first character inside [ ]
| Class | Equivalent | Matches |
|---|---|---|
[[:alpha:]] |
[a-zA-Z] | Alphabetic characters |
[[:digit:]] |
[0-9] | Digits |
[[:alnum:]] |
[a-zA-Z0-9] | Alphanumeric |
[[:space:]] |
[ \t\n\r\f\v] | Whitespace |
[[:lower:]] |
[a-z] | Lowercase |
[[:upper:]] |
[A-Z] | Uppercase |
[[:punct:]] |
- | Punctuation |
[[:print:]] |
- | Printable characters |
# Find lines with digits (locale-safe)
grep '[[:digit:]]' /var/log/messages
Specify how many times the preceding element should match
* matches the preceding element zero or more times
# Match "color" or "colour"
grep 'colou*r' file
# Match any amount of whitespace
grep 'error: *' logfile # Space followed by zero or more spaces
# Match anything (greedy!)
grep '.*' file # Matches entire line
# Common pattern: find lines with repeated characters
grep 'ss*' /etc/passwd # One or more 's'
+ = one or more
? = zero or one
# ERE: Match one or more digits (must use -E)
grep -E '[0-9]+' /var/log/messages
# ERE: Optional 's' for plural
grep -E 'files?' file
# BRE equivalent (escaped)
grep '[0-9]\+' /var/log/messages
grep 'files\?' file
{n,m} matches between n and m times (inclusive)
| Syntax | Meaning | Example |
|---|---|---|
{3} |
Exactly 3 times | [0-9]{3} = "123" |
{2,4} |
2 to 4 times | a{2,4} = "aa", "aaa", "aaaa" |
{2,} |
2 or more times | x{2,} = "xx", "xxx", ... |
{0,3} |
0 to 3 times | y{0,3} = "", "y", "yy", "yyy" |
# Match US ZIP codes (5 digits)
grep -E '^[0-9]{5}$' zipcodes.txt
# Match ZIP+4 format (5 digits, hyphen, 4 digits)
grep -E '^[0-9]{5}-[0-9]{4}$' zipcodes.txt
# Match 2-4 letter words
grep -E '\b[a-zA-Z]{2,4}\b' document.txt
⚠️ Quantifiers are greedy by default - they match as much as possible
<b>bold</b> and <b>more</b># Problem: greedy matching
echo 'first second ' | grep -o '.* '
# Returns: first second
# Solution: negated character class
echo 'first second ' | grep -oE '[^<]* '
# Returns: first
# second
| matches either the expression before OR after
# Match error or warning
grep -E 'error|warning' /var/log/messages
# Match multiple file extensions
ls | grep -E '\.jpg|\.png|\.gif'
# Match different log levels
grep -E 'ERROR|WARN|FATAL' application.log
# BRE requires escape
grep 'error\|warning' /var/log/messages
( ) groups expressions for quantifiers and alternation
# Repeat a group
grep -E '(na)+' lyrics.txt # "na", "nana", "nanana"
# Group with alternation
grep -E 'http(s)?://' urls.txt # http:// or https://
# Complex grouping
grep -E '(Mon|Tue|Wed|Thu|Fri)day' calendar.txt
\1, \2 reference previously matched groups
# Find repeated words
grep -E '\b([a-z]+)\s+\1\b' document.txt
# Matches: "the the", "is is", etc.
# Find lines where first and last word are the same
grep -E '^([a-zA-Z]+).*\1$' file
# Match HTML tags with matching close tags
grep -E '<([a-z]+)>.*\1>' file.html
# Find duplicate lines (consecutive)
sort file | grep -E '^(.*)$' | uniq -d
💡 Use Case: Finding duplicate words, validating paired elements, data consistency checks
The primary tool for regex searching in Linux
Basic Regular Expressions
Extended Regular Expressions
Fixed strings (no regex)
| Option | Description | Example |
|---|---|---|
-i |
Case insensitive | grep -i 'error' |
-v |
Invert match | grep -v '^#' |
-c |
Count matches | grep -c 'pattern' |
-n |
Show line numbers | grep -n 'TODO' |
-l |
List filenames only | grep -l 'main' *.c |
-o |
Only matching part | grep -oE '[0-9]+' |
-r |
Recursive search | grep -r 'config' /etc |
-w |
Whole word match | grep -w 'is' |
# Show 3 lines BEFORE match
grep -B3 'error' /var/log/messages
# Show 3 lines AFTER match
grep -A3 'error' /var/log/messages
# Show 3 lines before AND after (context)
grep -C3 'error' /var/log/messages
# Combine with other options
grep -B2 -A2 -n 'Exception' application.log
--
May 15 10:23:45 server process[1234]: Starting operation
May 15 10:23:46 server process[1234]: Loading config
May 15 10:23:47 server process[1234]: error: config not found
May 15 10:23:48 server process[1234]: Falling back to defaults
May 15 10:23:49 server process[1234]: Continuing...
# Find failed SSH logins
grep -E 'Failed password|authentication failure' /var/log/secure
# Extract IP addresses from log
grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' access.log
# Find active config lines (not comments, not empty)
grep -v '^#' /etc/ssh/sshd_config | grep -v '^$'
# Better: combine with ERE
grep -vE '^#|^$' /etc/ssh/sshd_config
# Count error types
grep -oE 'error [0-9]+' log | sort | uniq -c | sort -rn
# Find files containing pattern
grep -rl 'TODO' --include='*.py' ./src/
# Multiple patterns from file
grep -f patterns.txt logfile
sed applies text transformations using regular expressions
# Basic syntax
sed 's/pattern/replacement/' file
sed 's/pattern/replacement/g' file # Global (all occurrences)
# In-place editing
sed -i 's/old/new/g' file # Modifies file directly
sed -i.bak 's/old/new/g' file # Creates backup first
⚠️ Critical: sed -i modifies files directly! Always test first or create backups.
# Basic substitution
sed 's/error/ERROR/' logfile
# Global substitution (all occurrences on line)
sed 's/old/new/g' file
# Case insensitive
sed 's/error/ERROR/gi' file
# Delete matching lines
sed '/pattern/d' file
# Delete empty lines
sed '/^$/d' file
# Delete comments
sed '/^#/d' /etc/config
# Multiple operations
sed -e 's/foo/bar/g' -e 's/baz/qux/g' file
# Using different delimiter (useful for paths)
sed 's|/usr/local|/opt|g' file
# Swap first two fields (colon-separated)
sed 's/\([^:]*\):\([^:]*\)/\2:\1/' /etc/passwd
# ERE syntax (cleaner)
sed -E 's/([^:]*):([^:]*)/\2:\1/' /etc/passwd
# Reformat date: MM/DD/YYYY to YYYY-MM-DD
sed -E 's|([0-9]{2})/([0-9]{2})/([0-9]{4})|\3-\1-\2|g' dates.txt
# Add prefix to captured content
sed -E 's/^([0-9]+)/ID: \1/' file
# Surround matches with tags
sed -E 's/([0-9]{3}-[0-9]{4})/PHONE:\1:PHONE/g' contacts.txt
# Remove duplicate words
sed -E 's/\b([a-z]+)\s+\1\b/\1/g' document.txt
# Apply only to line 5
sed '5s/old/new/' file
# Apply to lines 5-10
sed '5,10s/old/new/' file
# Apply from line 5 to end
sed '5,$s/old/new/' file
# Apply to lines matching pattern
sed '/^#/s/old/new/' file
# Apply between two patterns
sed '/START/,/END/s/old/new/' file
# Delete from pattern to end of file
sed '/pattern/,$d' file
# Print only lines 10-20
sed -n '10,20p' file
awk combines regex pattern matching with field processing
# Basic syntax
awk '/pattern/ { action }' file
# Print lines matching pattern
awk '/error/' /var/log/messages
# Print specific fields from matching lines
awk '/error/ { print $1, $5 }' /var/log/messages
# Field separator
awk -F: '/root/ { print $1, $7 }' /etc/passwd
# Match at beginning of line
awk '/^root/' /etc/passwd
# Match at end of line
awk '/bash$/' /etc/passwd
# Match specific field
awk -F: '$7 ~ /bash/' /etc/passwd # Field 7 contains "bash"
awk -F: '$7 == "/bin/bash"' /etc/passwd # Field 7 equals exactly
# Negation
awk -F: '$7 !~ /nologin/' /etc/passwd # Field 7 doesn't contain
# Complex conditions
awk -F: '$3 >= 1000 && $7 ~ /bash/' /etc/passwd
# Multiple patterns
awk '/start/,/end/' file # Range between patterns
# Sum values in a column
awk '{ sum += $1 } END { print sum }' numbers.txt
# Average of matching lines
awk '/error/ { count++; sum += $NF } END { print sum/count }' log
# Extract unique values
awk -F: '{ print $7 }' /etc/passwd | sort -u
# Format output
awk -F: '{ printf "%-15s %s\n", $1, $7 }' /etc/passwd
# Count pattern occurrences by category
awk '/error/ { errors++ } /warning/ { warnings++ }
END { print "Errors:", errors, "Warnings:", warnings }' log
# Process Apache logs - count requests per IP
awk '{ ips[$1]++ } END { for (ip in ips) print ip, ips[ip] }' access.log
Frequently used regex patterns for system administration
# Simple IP pattern (matches invalid IPs like 999.999.999.999)
grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' log
# Extract IPs from Apache log
awk '{ print $1 }' access.log | grep -oE '[0-9.]+' | sort -u
# Count connections per IP
grep -oE '^[0-9.]+' access.log | sort | uniq -c | sort -rn | head
# Find specific subnet
grep -E '192\.168\.[0-9]+\.[0-9]+' /var/log/messages
# Basic email pattern
grep -oE '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}' contacts.txt
# URL pattern
grep -oE 'https?://[^[:space:]]+' document.txt
# Domain extraction from URL
grep -oE 'https?://[^/]+' urls.txt | sed 's|https\?://||'
# Find mailto links in HTML
grep -oE 'mailto:[^"]+' page.html
# Validate URL format
if [[ "$URL" =~ ^https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} ]]; then
echo "Valid URL"
fi
💡 Note: RFC-compliant email/URL validation is complex. These patterns work for common cases.
# ISO date: YYYY-MM-DD
grep -E '[0-9]{4}-[0-9]{2}-[0-9]{2}' file
# US date: MM/DD/YYYY
grep -E '[0-9]{2}/[0-9]{2}/[0-9]{4}' file
# Syslog timestamp: Mon DD HH:MM:SS
grep -E '^[A-Z][a-z]{2} [ 0-9][0-9] [0-9]{2}:[0-9]{2}:[0-9]{2}' /var/log/messages
# 24-hour time: HH:MM:SS
grep -oE '[0-2][0-9]:[0-5][0-9]:[0-5][0-9]' logfile
# Extract today's log entries
grep "^$(date '+%b %e')" /var/log/messages
# Find entries in time range
awk '/10:00:00/,/11:00:00/' /var/log/messages
# Find HTTP error codes (4xx, 5xx)
grep -E '" [45][0-9]{2} ' access.log
# Extract error messages
grep -oE 'error: [^,]+' application.log
# Find slow queries (over 1000ms)
grep -E 'query_time=[0-9]{4,}' mysql.log
# Match stack traces
grep -A20 'Exception' java.log
# Find repeated failed logins
grep 'Failed password' /var/log/secure |
grep -oE 'from [0-9.]+' |
sort | uniq -c | sort -rn
# Parse key=value pairs
grep -oE 'user=[^[:space:]]+' audit.log
Methodology for complex regex construction
Goal: Match Apache log entries with 404 errors
# Step 1: Match literal example
grep '404' access.log
# Step 2: Match the error code in context (status field)
grep '" 404 ' access.log
# Step 3: Add flexibility for any 4xx error
grep -E '" 4[0-9]{2} ' access.log
# Step 4: Extract relevant fields
grep -E '" 4[0-9]{2} ' access.log | awk '{ print $1, $7, $9 }'
# Step 5: Further refinement - get requested URLs
grep -E '" 4[0-9]{2} ' access.log |
awk '{ print $7 }' |
sort | uniq -c | sort -rn
| Mistake | Example | Solution |
|---|---|---|
| Unescaped dots | 192.168.1.1 |
192\.168\.1\.1 |
| Using shell glob syntax | grep *.txt file |
grep '\.txt' file |
| BRE vs ERE confusion | grep 'a+' file |
grep -E 'a+' file |
| Missing quotes | grep $var file |
grep "$var" file |
| Greedy matching | <.*> |
<[^>]*> |
| Case sensitivity | grep 'Error' log |
grep -i 'error' log |
# Test pattern interactively with color highlighting
grep --color=always 'pattern' file | less -R
# Show what's matching with -o
echo "test string here" | grep -oE 'pattern'
# Debug by building up pattern
grep 'simple' file # Start here
grep 'simp.e' file # Add complexity
grep -E 'simp.e+' file # Add more
# Count matches vs lines
grep -c 'pattern' file # Lines containing match
grep -o 'pattern' file | wc -l # Total matches
# Perl-compatible regex for testing (if available)
grep -P '(?<=prefix)pattern(?=suffix)' file
💡 Online Tools: regex101.com, regexr.com for interactive testing
Analyze a web server access log:
# Sample log format:
# 192.168.1.100 - - [10/May/2024:10:15:30 +0000] "GET /page.html HTTP/1.1" 200 1234
# Start with: /var/log/httpd/access_log or generate test data
Practice: Use regex daily - every log file is an opportunity!
. | Any character |
^ | Start of line |
$ | End of line |
\ | Escape |
[] | Character class |
[^] | Negated class |
* | Zero or more |
+ | One or more |
? | Zero or one |
{n} | Exactly n |
{n,m} | n to m times |
[[:alpha:]] | Letters |
[[:digit:]] | Digits |
[[:alnum:]] | Alphanumeric |
[[:space:]] | Whitespace |
-E | Extended regex |
-i | Case insensitive |
-v | Invert match |
-o | Only matching |
-c | Count |
man grep | man sed | man awk
grep -E '.*' your_questions.txt
RH134: Red Hat System Administration II
Regular Expressions for Text Matching