Tutorial

Forward Apache Logs to OpenSearch via Logstash

Published on September 30, 2024
authorauthor

Rahul Shettigar and {"slug":"easha"}

Forward Apache Logs to OpenSearch via Logstash

Introduction

Effective web server log management is crucial for maintaining your website’s performance, troubleshooting issues, and gaining insights into user behavior. Apache is one of the most popular web servers. It generates access and error logs that contain valuable information. To efficiently manage and analyze these logs, you can use Logstash to process and forward them to DigitalOcean’s Managed OpenSearch for indexing and visualization.

In this tutorial, we will guide you through installing Logstash on a Droplet, configuring it to collect your Apache logs, and sending them to Managed OpenSearch for analysis.

Prerequisites

  1. Droplet/s with Apache Webserver installed.
  2. Managed OpenSearch Cluster

Step 1 - Installing Logstash

Logstash can be installed using the binary files OR via the package repositories. For easier management and updates, using package repositories is generally recommended.

In this section, we’ll guide you through installing Logstash on your Droplet using both APT and YUM package managers.

Let’s identify the OS:

cat /etc/os-release

For APT-Based Systems (Ubuntu/Debian)

Download and install the Public Signing Key:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elastic-keyring.gpg

You may need to install the apt-transport-https package on Debian before proceeding:

sudo apt-get install apt-transport-https

Save the repository definition to /etc/apt/sources.list.d/elastic-8.x.list:

echo "deb [signed-by=/usr/share/keyrings/elastic-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-8.x.list

Use the echo method described above to add the Logstash repository. Do not use add-apt-repository as it will add a deb-src entry as well, but we do not provide a source package. If you have added the deb-src entry, you will see an error like the following:

Unable to find expected entry 'main/source/Sources' in Release file (Wrong sources.list entry or malformed file)

Just delete the deb-src entry from the /etc/apt/sources.list file and the installation should work as expected.

Run sudo apt-get update and the repository is ready for use. You can install it with:

sudo apt-get update && sudo apt-get install logstash

For YUM-Based Systems (CentOS/RHEL)

Download and install the public signing key:

sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

Add the following in your /etc/yum.repos.d/logstash.repo file. You can make use of ‘tee’ to update and create the file.

sudo tee /etc/yum.repos.d/logstash.repo > /dev/null <<EOF
[logstash-8.x]
name=Elastic repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
EOF

Your repository is ready for use. You can install it with:

sudo yum install logstash

For further information, please refer to the Installing Logstash guide.

Step 2 - Configuring Logstash to Send Logs to OpenSearch

A Logstash pipeline consists of three main stages: input, filter, and output. Logstash pipelines make use of plugins. You can make use of community plugins or create your own.

  • Input: This stage collects data from various sources. Logstash supports numerous input plugins to handle data sources like log files, databases, message queues, and cloud services.

  • Filter: This stage processes and transforms the data collected in the input stage. Filters can modify, enrich, and structure the data to make it more useful and easier to analyze.

  • Output: This stage sends the processed data to a destination. Destinations can include databases, files, and data stores like OpenSearch.

Step 3 - Installing the Open Search Output Plugin

The OpenSearch output plugin can be installed by running the following command:

/usr/share/logstash/bin/logstash-plugin install logstash-output-opensearch

More information can be found on this logstash-output-opensearch-plugin repository.

Now let’se create a pipeline:

Create a new file in the path /etc/logstash/conf.d/ called apache_pipeline.conf, and copy the following contents.

input {
  file {
    path => "/var/log/apache2/access.log"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    tags => "apache_access"
  }

  file {
    path => "/var/log/apache2/error.log"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    tags => "apache_error"
  }
}

filter {
  if "apache_access" in [tags] {
    grok {
        match => { "message" => "%{HTTPD_COMBINEDLOG}" }
    }
    mutate {
        remove_field => [ "message","[log][file][path]","[event][original]" ]
      }
   } else {
   grok {
        match => { "message" => "%{HTTPD24_ERRORLOG}" }
    }
   }
}

output {
  if "apache_access" in [tags] {
  opensearch {
    hosts       => "https://<OpenSearch-Hostname>:25060"
    user        => "doadmin"
    password    => "<your_password>"
    index       => "apache_access"
    ssl_certificate_verification => true
  }

  } else {
  opensearch {
    hosts       => "https://<OpenSearch-Hostname>:25060"
    user        => "doadmin"
    password    => "<your_password>"
    index       => "apache_error"
    ssl_certificate_verification => true
  }
  }
}

Replace the <OpenSearch_Host> with your OpenSearch server’s hostname and <OpenSearch_Password> with your OpenSearch password.

Let’s break down the above configuration.

  • INPUT: This is used to configure a source for the events. The ‘file’ input plugin is used here.

  • path => “/var/log/apache2/access.log” : Specifies the path to the Apache access log file that Logstash will read from

    Do make sure that the Logstash service has access to the input path.

  • start_position => “beginning”: Defines where Logstash should start reading the log file. “beginning” indicates that Logstash should start processing the file from the beginning, rather than from the end

  • sincedb_path => “/dev/null”: Specifies the path to a sincedb file. Sincedb files are used by Logstash to keep track of the current position in log files, enabling it to resume where it left off in case of restarts or failures.

  • tags => “apache_access”: Assigns a tag to events read from this input. Tags are useful for identifying and filtering events within Logstash, often used downstream in the output or filtering stages of the configuration. We are using tags for the latter

  • FILTER: is used to process the events.

    Starting with conditionals:

    (if "apache_access" in [tags]):
    

    This checks if the tag apache_access exists in the [tags] field of the incoming log events. We use this conditional to apply the appropriate GROK Filter for Apache access and error logs.

  • Grok Filter (for Apache Access Logs):

    grok {
        match => { "message" => "%{HTTPD_COMBINEDLOG}" }
    }
    

    The grok filter %{HTTPD_COMBINEDLOG} is a predefined pattern in Logstash used to parse Apache combined access log format. This extracts fields like IP address, timestamp, HTTP method, URI, status code, etc., from the message field of incoming events.

  • Mutate Filter Remove (optional): After the Apache logs are parsed, we use mutate-remove to remove certain fields.

    mutate {
        remove_field => [ "message","[log][file][path]","[event][original]" ]
    }
    
  • Else Condition: The else block is executed if the apache_access tag is not present in [tags]. This else block contains another GROK filter for Apache error logs.

    grok {
        match => { "message" => "%{HTTPD24_ERRORLOG}" }
    }
    

    This grok filter %{HTTPD24_ERRORLOG} parses messages that match the Apache error log format. It extracts fields relevant to error logs like timestamp, log level, error message, etc.

    GROK patterns can be found at: https://github.com/logstash-plugins/logstash-patterns-core/tree/main/patterns.

  • OUTPUT: The output plugin sends events to a particular destination.

    The output block begins with an if condition. We are using if conditionals here

    if "apache_access" in [tags] {}
    

    This if conditional is used to route logs to OpenSearch to two separate indexes, apache_error and apache_access.

    Let’s explore the OpenSearch Output plugin:

    hosts            => "https://XXX:25060"  Your Open search Hostname
    user             => "doadmin"            Your Open search Username
    password         => "XXXXX"              OpenSearch Password
    index            => "apache_error"       Index name in OpenSearch
    ssl_certificate_verification => true     Enabled SSL certificate verification
    

Step 4 - Start Logstash

Once the Pipeline is configured, start the Logstash service:

systemctl enable logstash.service
systemctl start logstash.service
systemctl status logstash.service

Step 5 - Troubleshooting

Check Connectivity

You can verify that Logstash can connect to OpenSearch by testing connectivity:

curl -u your_username:your_password -X GET "https://your-opensearch-server:25060/_cat/indices?v"

Replace <your-opensearch-server> with your OpenSearch server’s hostname and <your_username>, <your_password> with your OpenSearch credentials.

Data Ingestion

Ensure that data is properly indexed in OpenSearch:

curl -u your_username:your_password -X GET "http://your-opensearch-server:25060/<your-index-name>/_search?pretty"

Replace <your-opensearch-server> with your OpenSearch server’s hostname and <your_username>, <your_password> with your OpenSearch credentials. Similarly, <your-index-name> with the index name.

Firewall and Network Configuration

Ensure firewall rules and network settings allow traffic between Logstash and OpenSearch on port 25060.

Logs

The logs for Logstash can be found at /var/log/logstash/logstash-plain.log

For details, refer to Troubleshooting.

Conclusion

In this guide, we walked through setting up Logstash to collect and forward Apache logs to OpenSearch. Here’s a quick recap of what we covered:

Installing Logstash: We covered how to use either APT or YUM package managers, depending on your Linux distribution, to install Logstash on your Droplet.

Configuring Logstash: We created and adjusted the Logstash configuration file to ensure that Apache logs are correctly parsed and sent to OpenSearch.

Verifying in OpenSearch: We set up an index pattern in OpenSearch Dashboards to confirm that your logs are being indexed properly and are visible for analysis.

With these steps completed, you should now have a functional setup where Logstash collects Apache logs and sends them to OpenSearch.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the authors

Default avatar
{"slug":"easha"}

editor


Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 
Leave a comment


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!

Sign up

Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

Featured on Community

Get our biweekly newsletter

Sign up for Infrastructure as a Newsletter.

Hollie's Hub for Good

Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.

Become a contributor

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

Welcome to the developer cloud

DigitalOcean makes it simple to launch in the cloud and scale up as you grow — whether you're running one virtual machine or ten thousand.

Learn more