SE465

A7 Automated Testing Results Analysis

Due: April 7th, 11:59 pm

Introduction

In this assignment, you will use the elastic stack (ELK) to analyze the web server access log.

You will learn:

  • How logs are collected and shipped.
  • How logs are aggregated, filtered, and transformed. 
  • How to search and analyze logs by using Elasticsearch.
  • How to manage and visualize the result by using Kibana.

1. Why Log Analysis?

Log analysis (or system and network log analysis) is an art and science seeking to make sense of computer-generated records (also called log or audit trail records). 

Typical reasons: compliance with security policies, system troubleshooting, security incident response, and understanding online user behaviour.

2. What is the ELK stack?

ELK is the acronym for three open-source tools: Elasticsearch, Logstash, and Kibana. Elasticsearch is a search and analytics engine. Logstash is a server‑side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to a “stash” like Elasticsearch.  Kibana lets users visualize data with charts and graphs in Elasticsearch. Figure 2 shows how the ELK stack works.

Since Logstash highly depends on JVM to run, it is quite memory-consuming, to resolve this issue, a series of lightweight tools called Beats are created to ship logs from different places to Logstash for aggregation, filtering, and enrichment. Different Beats are used for different purposes, Filebeat is used to ship log files, Packetbeat is used to ship network metrics and Metricbeat is used to ship server metrics, etc.

The figure below demonstrates how the elastic stack works with Beats.

Environment Setup

1. Prerequisites

Download the ELK stack project from docker-elk. To make the configuration easier, we will run the ELK stack in docker. You need to install Docker on your computer first. After the installation of docker, please assign at least 4 GB of RAM for the containers. More details can be found on the GitHub page of the docker-elk project.

Installed docker here: https://docs.docker.com/desktop/install/mac-install/

2. Start the ELK Stack

Note: To avoid unexpected issues, Windows users are recommended to run the command line as administrator.

  • Unzip the project file and change the current working directory to it.
  • Under the project root directory, run:
docker-compose up setup
docker-compose up

You should see something like below:

  • Verify the installation of the ELK stack. Open http://localhost:5601/ in your browser, the username is elastic and the password is changeme. You should see a page like below:

Congratulations! You have successfully installed and started the ELK stack.

Explore the Data

1. Import data and generate the index

  • On the home page, click the Search button and the Upload a file link, and select and choose the target log file for analysis. You can download the sample data from here
  • Click Override Settings, you can override some default settings here if you want.
  • Click Import. Name the index and then click Advanced, we will use some processors to further parse the logs.

  • Modify the Ingest pipeline field to add two new processors geoip and user_agent after the grok processor (be careful with the comma). The geoip plugin is used to get the country, city, and postal code information from the IP address and user_agent is used to fetch the operating system and browser information from the logs.
{
     "grok": {
       "field": "message",
       "patterns": [
         "%{COMBINEDAPACHELOG}"
       ],
       "ecs_compatibility": "v1"
     }
   },
   {
     "geoip": {
       "field": "source.address",
       "target_field": "geoip",
       "ignore_missing": true
     }
   },
   {
     "user_agent": {
       "field": "user_agent.original",
       "target_field": "user_agent",
       "ignore_missing": true
     }
   },
  •  Click Import again and you should see a webpage like below.

2. View the index in Discover 

Click the link View index in Discover as shown in the above Figure. On the Discover page, you can search the data based on your requirements by using KQL(Kibana Query Language). For example, as demonstrated in the following Figure, you can fetch the documents that send an HTTP GET request by using http.request.method : “GET” 

If you want to know more details about the usage of KQL, you can check the official document.

3. Data Visualizer 

Click on the side menu bar, Visualize Library, and Create new visualization. Click Lens, and you will see the following page. You can add some available fields on the left and select the types of graphs to visualize these fields.

4. Pie chart 

Clear the layers on the above page, select Pie in the dropdown list, and create a pie chart for the request referrer field by following the configuration indicated in the following figure.

We can further split this pie chart with sub-aggregations. Modify the above pie chart to only show the top 5 items and then click add or drag-and-drop field. Select field source.address and follow the configurations below to create a stacked pie chart that shows the top 5 IP addresses of each of the top 5 request referrers.

5. Line chart 

Select Line from the dropdown list. Select the Date histogram as the horizontal axis.

For the vertical axis, select Count as the function and http.request.referrer as the field.

You will get a line chart of the request.referrer.

You can make further configurations to create a line chart of the top 3 request.referrer. Click Add or drag-and-drop a field, following the configurations below to show the trend of the top 3 request referrer (the time frame is set to ).

6. Bar chart 

Similar to the steps of creating a pie chart and a line chart, you can also generate a bar chart like below.

Assignment #7: report

Note: There should be 17,279 records imported in total, make sure you don’t miss any data. For question 1-4, set the time frame as: Mar 1, 2020 @ 19:00:05.000→Mar 2, 2020 @ 18:59:55.000

  1. How many different kinds of HTTP response status code are there in the data and what are the proportions of each? Give the answer (0.5%) and a pie chart (0.5%) to visualize the result. (Take a whole screenshot of your browser that shows the required plots clearly and also shows the URL input area, same for the following questions.)

There are 2 different kinds of HTTP response status codes (200 and 404) and the proportions of each are:

  • 200 (33.11%)
  • 404 (66.89%)

Include screenshot

  1. What are the names of the top 5 countries in the records and the top 3 browser names in each of these top 5 countries? Give the answer in descending order (0.5%) and a stacked pie chart to show the results (0.5%). 

  2. What are the names of the top 3 cities in each continent of the earth in the records? Give the answer in descending order (0.5%) and a stacked bar chart that demonstrates the results (0.5%).

  3. Generate one line chart (set minimum interval of X-axis to 30 min) that only shows the records in Europe and Asia (two lines in one chart) (1%). Give all the minimum time intervals (e.g., 19:00-19:30) where the number of records from Europe equals that from Asia (intersections of two lines) (1%). Tips: you may need filters when creating the chart.

  4. Download the CSV data from here, upload it to Kibana, and generate a line chart that displays the trend of the converted (converted=1) access from the control group and the treatment group (two lines in one chart).  Describe the steps of creating such a line chart with screenshots and give the chart (Set the minimum interval to 12 hours and the local time period is Jan 2, 2017 @ 13:42:05.378→Jan 24, 2017 @ 13:41:54.460) (1%).

Done in google docs.