HoneyPot: Dionaea Visualisasi Data

From OnnoWiki
Jump to navigation Jump to search

data visualisation

new files Presenting data in a human compatible way is a problem, rumors say at this stage of evolution pictures work best. Therefore some hints how to create graphs using the dionaea logsql sqlite database. sqlite

As dionaea embeds sqlite, it is rather easy to extract the data. the 'uniq files' query

We want to graph the number of new uniq files a day, I used the following query to make sqlite create the dataset which can be used to graph.

SELECT strftime('%Y-%m-%d',connection_timestamp,'unixepoch','localtime') AS date, MAX(downloads.download) AS total, COUNT(downloads.download) AS today, ( SELECT COUNT(DISTINCT x.download_md5_hash) FROM downloads AS x NATURAL JOIN connections AS c WHERE strftime('%Y-%m-%d',c.connection_timestamp,'unixepoch','localtime') <= strftime('%Y-%m-%d',connections.connection_timestamp,'unixepoch','localtime') ) AS uniq, ( SELECT COUNT(*) FROM ( SELECT a.download_md5_hash FROM downloads AS a JOIN connections AS b ON(a.connection = b.connection) GROUP BY a.download_md5_hash HAVING strftime('%Y-%m-%d',MIN(b.connection_timestamp),'unixepoch','localtime') = strftime('%Y-%m-%d',MAX(connections.connection_timestamp),'unixepoch','localtime') ) )AS uniq_this_day, ( SELECT COUNT(*) FROM ( SELECT MIN(a.download) AS download FROM downloads AS a JOIN connections AS b ON(a.connection = b.connection) GROUP BY a.download_md5_hash HAVING strftime('%Y-%m-%d',MIN(b.connection_timestamp),'unixepoch','localtime') = strftime('%Y-%m-%d',connections.connection_timestamp,'unixepoch','localtime') ) AS newdownloads NATURAL JOIN downloads WHERE download_url LIKE 'logxmpp://%' )AS uniq_this_day_via_logxmpp FROM downloads NATURAL JOIN connections GROUP BY date ORDER BY date DESC;

uniqfiles.sql performance

The queries performance is horrible, for each day we run 3 subqueries, 2 of these subqueries have another subquery. The grouping by the generated date-string is suboptimal too, still the query takes about a second on a moderate sized database. Running the query on the paris and berlin databases took ~90s for berlin, and ~679s for paris. When working with really large databases, it might be usefull to create a temp table with the data which is required in the representation which is used. create the dataset

sqlite3 /opt/dionaea/var/dionaea/logsql.sqlite .timer on .output uniqfiles.txt .read uniqfiles.sql CPU Time: user 1.850000 sys 0.000000

gnuplot

As the “new” files distribution is more or less random, the graphs look like a saw, which makes them hard to get, therefore I decided to smooth the graphs for these numbers, rendering the graph inexact, but giving a good overview of the tendency. plot new uniq files

new files

set terminal png size 600,120 nocrop butt font "/usr/share/fonts/truetype/ttf-liberation/LiberationSans-Regular.ttf" 8 set output "newfiles.png" set xdata time set timefmt "%Y-%m-%d" set format x "%b %d" set ylabel "total" set y2label "new" set y2tics set datafile separator "|" plot 'uniqfiles.txt' using 1:4 title "uniq" with lines, "" using 1:6 smooth bezier title "new uniq" with lines axes x1y2, "" using 1:5 smooth bezier title "new xmpp" with lines axes x1y2

plot total files

total files

set terminal png size 800,160 nocrop butt font "/usr/share/fonts/truetype/ttf-liberation/LiberationSans-Regular.ttf" 8 set output "totalfiles.png" set xdata time set timefmt "%Y-%m-%d" set format x "%b %d" set ylabel "total" set y2label "downloads" set y2tics set datafile separator "|" plot 'uniqfiles.txt' using 1:2 title "total" with lines, "" using 1:3 smooth bezier title "downloads" with lines axes x1y2



Referensi