File Handling and Text Processing

File Handling and Text Processing in Bash Scripts

Objectives:

  • Understand how to read from and write to files in Bash.

  • Learn to manipulate text and extract information using common tools.

  • Explore practical examples relevant to cybersecurity.


Chapter Outline:

5.1 File Handling in Bash

Bash provides several ways to handle files, including reading, writing, and appending data.

5.1.1 Reading from a File

You can read a file line by line using a while loop:

while IFS= read -r line; do
    echo "$line"
done < input_file.txt
  • IFS=: Sets the Internal Field Separator to prevent leading/trailing whitespace from being trimmed.

  • read -r: Reads a line from the file without interpreting backslashes.

5.1.2 Writing to a File

You can write to a file using redirection:

echo "This is a new line." >> output_file.txt
  • >>: Appends to the file. Use > to overwrite the file.

5.2 Text Processing Tools

Bash includes several powerful tools for text processing, such as grep, awk, sed, and cut.

5.2.1 Using grep

grep is used to search for specific patterns in files.

grep "ERROR" log_file.txt
  • This command searches for lines containing "ERROR" in log_file.txt.

5.2.2 Using awk

awk is a programming language designed for pattern scanning and processing.

awk '{print $1, $3}' data_file.txt
  • This command prints the first and third columns of data_file.txt.

5.2.3 Using sed

sed is a stream editor for filtering and transforming text.

sed 's/old_text/new_text/g' input_file.txt > output_file.txt
  • This command replaces all occurrences of old_text with new_text in input_file.txt and saves the result to output_file.txt.

5.3 Real-World Example: Log Analysis Script

Here’s a simple script to analyze a log file and extract error messages:

#!/bin/bash

# Function to display usage message
usage() {
    echo "Usage: $0 <log_file>"
    exit 1
}

# Check if an argument (log file) is provided
if [ $# -ne 1 ]; then
    echo "Error: No log file specified."
    usage
fi

log_file="$1"
error_log="error_report.txt"

# Check if the specified file exists and is a regular file
if [ ! -f "$log_file" ]; then
    echo "Error: Log file '$log_file' not found or is not a regular file."
    exit 1
fi

# Check if the log file is readable
if [ ! -r "$log_file" ]; then
    echo "Error: Log file '$log_file' is not readable."
    exit 1
fi

# Extract ERROR lines and save them to error_report.txt
grep "ERROR" "$log_file" > "$error_log"

# Check if grep found any errors
if [ $? -ne 0 ]; then
    echo "No ERROR entries found in the log file."
    exit 0
fi

# Count the number of errors found
error_count=$(wc -l < "$error_log")

echo "Total errors found: $error_count"
echo "Error report saved to $error_log"

Detailed Explanation:

  1. Usage Function:

    • A usage() function prints the correct way to run the script and exits if the required input is missing. This helps users understand how to use the script.

  2. Argument Validation:

    • if [ $# -ne 1 ]; then: Checks if exactly one argument (the log file) is passed. If not, the script informs the user and calls the usage() function.

  3. File Validation:

    • if [ ! -f "$log_file" ]; then: Verifies that the file exists and is a regular file.

    • if [ ! -r "$log_file" ]; then: Ensures the file is readable to avoid permission issues.

  4. Grep and Error Handling:

    • grep "ERROR" "$log_file" > "$error_log": Searches for "ERROR" in the provided log file and writes matching lines to error_report.txt.

    • if [ $? -ne 0 ]; then: Checks if grep was successful in finding matches. If no "ERROR" lines were found, the script notifies the user and exits.

  5. Counting Errors:

    • error_count=$(wc -l < "$error_log"): Counts the number of lines in error_report.txt, i.e., the number of errors found.

    • Output: The script prints the total error count and informs the user where the report has been saved.

Explanation of the Log Analysis Script:

  1. File Check: The script checks if the log file exists. If not, it prints a message and exits.

  2. Grep Usage: It uses grep to extract lines containing "ERROR" from system.log and saves them to error_report.txt.

  3. Count Errors: It counts the number of lines in error_report.txt (i.e., the number of errors found) and prints the result.

5.4 Summary

In this chapter, you learned about:

  • File handling in Bash, including reading and writing files.

  • Common text processing tools (grep, awk, sed).

  • A practical example of log analysis using these techniques.

Exercises:

  1. Write a script that reads a CSV file and calculates the average of a specified column.

  2. Create a log parsing script that identifies the top 5 most frequent error messages from a log file.


Last updated