Grapic dots

Elasticsearch: Index Template with Apache Access Example

Mike Milano

Elasticsearch will take any data you throw at it, but without a predefined index, it will try to figure out the data type on its own, which can result in both inefficient storage and querying. It can also limit ways to use your data in some cases.

Let's consider http status codes for example. These values are always integers, however Elasticsearch will treat them like text unless you explicitly define your index.

As text, whenever you search on this field, Elasticsearch will run it's analysis functions when it's most likely never needed. Furthermore, you would not be able to run numeric queries against this field. i.e. status>400

This is where mapping comes into play. While Elasticsearch is a Document store and not a relational database (RDB), you can still define types on properties which tell Elasticsearch how the data should be indexed and stored. The term mapping is what would be known as the schema in the RDB world.

Mappings are defined as part of an index.

Dynamic Indexing

By default, if an index does not exist, Elasticsearch will automatically create one. If you don't define your index before you started receiving data, you will not have control over the data types.

Dynamic indexing is pretty nice. For example, let's say you have several projects receiving access logs and you want each project to have its own index. This would lay the foundation for you to safely scale as Elasticseach is so good at.

To specify mappings though, you would need to create an index for each project, before you started sending data.

Laborious and error prone, right?

It doesn't have to be if you use index templates.

Elasticsearch Index Templates

Index Templates allow you to define index structure which will be applied to new indices matching a string defined in the tempate.

For example, let's say you push data to a non-existant index: access-mysite. If an index template exists with the pattern: access-* , the new index will inherit the settings defined in that template.

Apache Access Index Template

Although an index can have many types, we've heard (via official training) that Elasticsearch may be going in the direction of one type per index in the future. Because of this, we choose to model with 1 type per index.

The following template defines our access indices. Most of it will apply to any apache log, but we add project and env fields based on our input source in Logstash. The method we use to push logs is outlined here.

The other non-standard field is request_id. This is a string provided by Acquia in the logs which allows for relation of the access request to entries in other logs.

Below is the full API call you can make (or paste into Kibana) to create an apache access template.

PUT _template/access
{
  "template": "access*",
  "settings": {
    "index": {
      "number_of_shards": "4",
      "number_of_replicas": "0",
      "refresh_interval": "10s"
    }
  },
  "mappings": {
    "access": {
      "_meta": {
        "version": "1.0"
      },
      "_all": {
        "enabled": false
      },
      "dynamic": false,
      "properties": {
        "@timestamp": {
          "type": "date",
          "doc_values": true
        },
        "@version": {
          "type": "keyword",
          "doc_values" : true
        },
        "type": {
          "type": "keyword"
        },
        "project": {
          "type": "keyword"
        },
        "env": {
          "type": "keyword"
        },
        "host": {
          "type": "keyword"
        },
        "vhost": {
          "type": "keyword"
        },
        "domain": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "auth": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "httpversion": {
          "type": "keyword"
        },
        "verb": {
          "type": "keyword"
        },
        "path": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "referrer": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "response": {
          "type": "short"
        },
        "request_time": {
          "type": "long"
        },
        "bytes": {
          "type": "long"
        },
        "request_id": {
          "type": "keyword"
        },
        "ip": {
          "type": "ip",
          "doc_values": true
        },
        "agent": {
          "type": "text"
        },
        "geoip": {
          "type": "object",
          "dynamic": true,
          "properties": {
            "ip": {
              "type": "ip",
              "doc_values": true
            },
            "location": {
              "type": "geo_point",
              "doc_values": true
            },
            "latitude": {
              "type": "float",
              "doc_values": true
            },
            "longitude": {
              "type": "float",
              "doc_values": true
            }
          }
        }
      }
    }
  }
}

 

If you ever want to retrieved a list of templates, you can run the following API call.

GET /_template

 

In Summary

I wanted to write this article because it would have helped me significantly when learning about modeling to simply see a working template, complete with a defined index that is applicable to a standard use case such as apache access logs.

Indices, mappings, and index templates are essential foundations of Elasticsearch. I encourage you to explore the docs to learn more.

Tags:  Elasticsearch

8 Reasons Why We Believe Drupal is the Right Choice for Enterprise.
Read about in this free white paper.

Download for Free Now!