6.3. Statistics

6.3.1. Introduction

This tutorial shows how to collect traffic and make statistics on collected data.

6.3.2. How-to

This tutorial introduces two Haka script files: stats_on_exit and stats_interactive which could be ran using the hakapcap tool as follows:

$ cd <haka_install_path>/share/haka/sample/stats
$ hakapcap <script_stat_file> <pcap_file>

Note

In this tutorial we use a pre-processed pcap file originated from the DARPA dataset and which could be retrieved from the MIT website. We filtered out some packets to get a reasonable size capture that you can download from the Haka website in the Resources section.

6.3.3. Collecting data

Before making statisctics, we need first to collect data. This is the purpose of stats.lua file which starts by creating a global stats table. This table will be updated with http info whenever a new http response is received. An entry in the stats table consists of the following fields:

  • ip: source ip
  • method: http request method (get, post, etc.)
  • host: http host
  • resource: normalized path
  • referer: referer header
  • usergant: user-agent header
  • status: response status code

Note

While the security rule is evaluated whenever a http response is received, we still get access to http request fields and ip header fields.

--------------------------
-- Loading dissectors
--------------------------

require('protocol/ipv4')
require('protocol/tcp')
local http = require('protocol/http')

local tbl = require('stats_utils')

-- Each entry of stats table will store info
-- about http request/response (method, host,
-- resource, status, etc.)
local stats = tbl.new()

--------------------------
-- Setting next dissector
--------------------------

http.install_tcp_rule(80)

--------------------------
-- Recording http info
--------------------------

haka.rule{
    hook = http.events.response,
    eval = function (http, response)
        local request = http.request
        local split_uri = request.split_uri:normalize()
        local entry = {}
        entry.ip = tostring(http.flow.srcip)
        entry.method = request.method
        entry.resource = split_uri.path or ''
        entry.host = split_uri.host or ''
        entry.useragent = request.headers['User-Agent'] or ''
        entry.referer = request.headers['Referer'] or ''
        entry.status = response.status
        table.insert(stats, entry)
    end
}

return stats

6.3.4. Stats utilities

This section introduces the stats utilities developed for this tutorial. More precisely, it shows how to create the global stats table and how to run basic stats operations on the created table.

stats_utils.new() → table
Returns:
  • table (stats) – New stats table.

Create the stats table.

object stats_utils.stats
<stats>:list()

Print column names of stats table.

<stats>:dump([nb])
Parameters:
  • nb (number) – Number of entries to display.

Print nb entries of stats table.

<stats>:top(column_name[, nb])
Parameters:
  • column_name (string) – Column to query.
  • nb (number) – Number of entries to display.

Dump the top 10 of given field name. Limits output to nb if it is provided.

<stats>:select_table(column_tab[, where])
Parameters:
  • column_name (string) – Column to query.
  • where (function) – Filter function called for each table line.

Select specific columns from table. Optionally, filter entry-lines based on where function.

6.3.5. Dumping stats

The first Haka script (stats_on_exit) gives an usage of the above api.

local stats = require('stats')

-- Run some stats at exit on collected http info
haka.rule{
    hook = haka.events.exiting,
    eval = function ()
        print("top 10 (by default) of useragent header")
        stats:top('useragent')
        print("")

        print("select columns 'ip, 'method' and 'resource' from the stats table")
        stats:select_table({'ip', 'method', 'resource'}):dump(5)
        print("")

        print("list of source ip using 'Mozilla/2.0' as user-gent'")
        stats:select_table({'ip', 'useragent'},
            function(elem) return elem.useragent:find('Mozilla/2.0') end):dump(5)
        print("")

        print("top ten of http resources that generated the most 404 status error")
        stats:select_table({'resource', 'status'},
            function(elem) return elem.status == '404' end):top('resource')
        print("")
    end
}

The script will output some statistics on collected http trafic after parsing all packets in the provided pcap file (i.e. at Haka exit). Below, a snippet output generated while running the Haka script file on the DARPA pcap file:

...
list of source ip using 'Mozilla/2.0' as user-gent'
| ip             | useragent                 |
| 172.16.117.132 | Mozilla/2.01 (Win3.1; I;) |
| 172.16.117.132 | Mozilla/2.01 (Win3.1; I;) |
| 172.16.117.132 | Mozilla/2.01 (Win3.1; I;) |
| 172.16.117.132 | Mozilla/2.01 (Win3.1; I;) |
| 172.16.117.132 | Mozilla/2.01 (Win3.1; I;) |
... 12762 remaining entries
...

6.3.6. Getting stats in interactive mode

The second script (stats_interactive) fills the stats table with http info (thanks to the stats.lua script) and then launches the intercative mode after parsing all packets in the provided pcap file. Statistics are then available through the stats variable.

stats = require('stats')
local color = require('color')

haka.rule{
    hook = haka.events.exiting,
    eval = function()
        debug.interactive.enter(color.green .. color.bold .. ">  " .. color.clear,
            color.green .. color.bold .. ">> " .. color.clear,
            "entering interactive mode for playing statistics\nStatistics are available through 'stats' variable. Run" ..
            "\n\t- stats:list() to get the list of column names" ..
            "\n\t- stats:top(column) to get the top 10 of selected field" ..
            "\n\t- stats:dump(nb) to dump 'nb' entries of stats table" ..
            "\n\t- stats:select_table({column_1, column_2, etc.}, cond_func)) to select some columns and filter them based on 'cond_func' function" ..
            "\n\nExamples:" ..
            "\n\t- stats:top('useragent')" ..
            "\n\t- stats:select_table({'resource', 'status'}, function(elem) return elem.status == '404' end):top('resource')")
    end
}

Hereafter, hakapcap output when entering the interactive mode:

 entering interactive session: entering interactive mode for playing statistics
 Statistics are available through 'stats' variable. Run
     - stats:list() to get the list of column names
     - stats:top(column) to get the top 10 of selected field
     - stats:dump(nb) to dump 'nb' entries of stats table
     - stats:select_table({column_1, column_2, etc.}, cond_func)) to select some columns and filter them based on 'cond_func' function
 Examples:
     - stats:top('useragent')
     - stats:select_table({'resource', 'status'}, function(elem) return elem.status == '404' end):top('resource')
>

Note

Press CTRL-D to leave the interactive mode