5.3. Statistics

5.3.1. Introduction

This tutorial shows how tu collect traffic and make statistics on collected data.

5.3.2. How-to

This tutorial introduces two lua script files: stats_on_exit and stats_interactive which could be run using the hakapcap tool as follows:

$ cd <haka_install_path>/share/haka/sample/stats
$ hakapcap <pcap_file> <script_stat_file>

Note

In this tutorial we use a pre-processed pcap file originated from the DARPA dataset and which could be retrieved from the MIT website. We filtered out some packets to get a reasonable size capture that you can download from the Haka website in the Resources section.

5.3.3. Collecting data

Before making statisctics, we need first to collect data. This is the purpose of stats.lua file which starts by creating a global stats table. This table will be updated with http info whenever a new http response is received. An entry in the stats table consists of the following fields:

  • ip: source ip
  • method: http request method (get, post, etc.)
  • host: http host
  • resource: normalized path
  • referer: referer header
  • usergant: user-agent header
  • status: response status code

Note

While the security rule ‘hooks’ on http-response we still get access to http request fields (via request accessor) and ip header fields (through connection accessor)

--------------------------
-- Loading dissectors
--------------------------

require('protocol/ipv4')
require('protocol/tcp')
require('protocol/http')

local tbl = require('stats_utils')

-- Each entry of stats table will store
-- info about http request/response (method,
-- host, resource, status, etc.)
local stats = tbl.new()

--------------------------
-- Setting next dissector
--------------------------

haka.rule{
    hooks = { 'tcp-connection-new' },
    eval = function(self, pkt)
        local tcp = pkt.tcp
        if tcp.dstport == 80 then
            pkt.next_dissector = "http"
        end
    end
}

--------------------------
-- Recording http info
--------------------------

haka.rule{
    hooks = { 'http-response' },
    eval = function (self, http)
        local conn = http.connection
        local response = http.response
        local request = http.request
        local split_uri = request:split_uri():normalize()
        local entry = {}
        entry.ip = tostring(conn.srcip)
        entry.method = request.method
        entry.resource = split_uri.path or ''
        entry.host = split_uri.host or ''
        entry.useragent = request.headers['User-Agent'] or ''
        entry.referer = request.headers['Referer'] or ''
        entry.status = response.status
        table.insert(stats, entry)
    end
}

return stats

5.3.4. Stats utilities

This section introduces the stats utilities developed for this tutorial. More precisely, it shows how to create the global stats table and how to run basic stats operations on the created table.

stats_utils.new()

Create the stats table.

class stats_utils.stats
list(self)

Print column names of stats table.

dump(self[, nb])

Print nb entries of stats table.

top(self, column_name[, nb])

Dump the top 10 of given field name. Limits output to nb if nb is provided.

select_table(self, column_tab[, where])

Select specific columns from stats table. Optionally, filter entry-lines based on where function.

5.3.5. Dumping stats

The first lua script (stats_on_exit) gives an usage of the above api.

local stats = require('stats')

-- Run some stats at exit on collected
-- http info
haka.on_exit(function ()
    print("top 10 (by default) of useragent header")
    stats:top('useragent')
    print("")

    print("select columns 'ip, 'method' and 'resource' from the stats table")
    stats:select_table({'ip', 'method', 'resource'}):dump(5)
    print("")

    print("list of source ip using 'Mozilla/2.0' as user-gent'")
    stats:select_table({'ip', 'useragent'},
        function(elem) return elem.useragent:find('Mozilla/2.0') end):dump(5)
    print("")

    print("top ten of http resources that generated the most 404 status error")
    stats:select_table({'resource', 'status'},
        function(elem) return elem.status == '404' end):top('resource')
    print("")
end)

The script will output some statistics on collected http trafic after parsing all packets in the provided pcap file (i.e. at haka exit). Below, a snippet output generated while running the lua script file on the DARPA pcap file:

...
list of source ip using 'Mozilla/2.0' as user-gent'
| ip             | useragent                 |
| 172.16.117.132 | Mozilla/2.01 (Win3.1; I;) |
| 172.16.117.132 | Mozilla/2.01 (Win3.1; I;) |
| 172.16.117.132 | Mozilla/2.01 (Win3.1; I;) |
| 172.16.117.132 | Mozilla/2.01 (Win3.1; I;) |
| 172.16.117.132 | Mozilla/2.01 (Win3.1; I;) |
... 12762 remaining entries
...

5.3.6. Getting stats in interactive mode

The second script (stats_interactive) fills the stats table with http info (thanks to the stats.lua script) and then launches the intercative mode after parsing all packets in the provided pcap file. Statistics are then available through the stats variable.

stats = require('stats')
local color = require('color')

haka.on_exit(function()
    haka.debug.interactive.enter(color.green .. color.bold .. ">  " .. color.clear,
        color.green .. color.bold .. ">> " .. color.clear,
        "entering interactive mode for playing statistics\nStatistics are available through 'stats' variable. Run" ..
        "\n\t- stats:list() to get the list of column names" ..
        "\n\t- stats:top(column) to get the top 10 of selected field" ..
        "\n\t- stats:dump(nb) to dump 'nb' entries of stats table" ..
        "\n\t- stats:select_table({column_1, column_2, etc.}, cond_func)) to select some columns and filter them based on 'cond_func' function" ..
        "\n\nExamples:" ..
        "\n\t- stats:top('useragent')" ..
        "\n\t- stats:select_table({'resource', 'status'}, function(elem) return elem.status == '404' end):top('resource')")
end)

Hereafter, hakapcap output when entering the interactive mode:

 entering interactive session: entering interactive mode for playing statistics
 Statistics are available through 'stats' variable. Run
     - stats:list() to get the list of column names
     - stats:top(column) to get the top 10 of selected field
     - stats:dump(nb) to dump 'nb' entries of stats table
     - stats:select_table({column_1, column_2, etc.}, cond_func)) to select some columns and filter them based on 'cond_func' function
 Examples:
     - stats:top('useragent')
     - stats:select_table({'resource', 'status'}, function(elem) return elem.status == '404' end):top('resource')
>

Note

Press CTRL-D to leave the interactive mode