HoneyPot: Dionaea SQL Logging
logging
Nepenthes had awful logging, huge logfiles, pretty useless for most people. Some people even started writing parsers for the logfiles to extract&convert the usefull information for use in a database. For dionaea, I decided to stick with awful logging to textfiles, but provide a useful alternative which is easy to setup and maintain, feature rich and allows retrieving information in a useful way, so you don't have to grep. Therefore, SQLite is used to write usefull information down to disk in the logsql.py script. I know, SQLite is not PostgreSQL, PostgreSQL is superior in many ways, but it requires some more steps to setup, where SQLite just works out of the box. SQLite does not support concurrency, but as dionaea does not access the database simulaneaously, there were no problems with database-concurrency. On the other hand, if it works with SQLite, it will work with PostgreSQL too, all you'll have to do is adjust some things.
The definition of useful information is undefined, therefore I decided to go for things I want to see for now:
connections exploits malware offers malware downloads
SQLite rocks
After adjusting the code, to be able to indicate a relation between connections, I had to recompile dionaea's python to support sqlite, and updated the docs. After verifying 'import sqlite3' worked, I started hacking the code to log incidents to the database. Once some things got into the database, I wrote some queries to make use of the data. attacked ports
Which ports got the most connections incoming:
SELECT COUNT(local_port) AS hitcount, local_port AS port FROM connections WHERE connection_type = 'accept' GROUP BY local_port HAVING COUNT(local_port) > 10
hitcount port 75 80 995 135 3982 445 29 1957 attacks over a day
Distribution of the attacks over a day:
SELECT ROUND((connection_timestamp%(3600*24))/3600) AS hour, COUNT(*) FROM connections WHERE connection_parent IS NULL GROUP BY ROUND((connection_timestamp%(3600*24))/3600);
hour hits 0.0 20 1.0 140 2.0 220 3.0 352 4.0 299 5.0 42 6.0 200 12.0 283 13.0 302 14.0 227 15.0 647 16.0 351 17.0 264 18.0 274 19.0 398 20.0 285 21.0 304 22.0 329 23.0 142 24.0 229
As I do not have enough data for a single day yet, we miss the hours 7-11. popular malware downloads
Which files got downloaded most:
SELECT COUNT(download_md5_hash), download_md5_hash FROM downloads GROUP BY download_md5_hash ORDER BY COUNT(download_md5_hash) DESC
count(download_md5_hash) download_md5_hash 38 df51e3310ef609e908a6b487a28ac068 19 14a09a48ad23fe0ea5a180bee8cb750a 6 32b0a00ffb8799a7782a19ee99a0214f 5 6e0a868563ee18fe6569796c7ce169be 3 2fa0e36b36382b74e6e6a437ad664a80 3 329c9d3163d9bf8f4bb9f0d1b846b6aa 3 4f4f2ebc395b0fbea708acfc3930ec6d 2 1d419d615dbe5a238bbaa569b3829a23 1 27f96b0496502104cf0bfc01391e1408 1 6c9d56535401bbd9052f499d05816ced 1 a8640b9698542689880b7619dbde1293 busy attackers
Which host attacked us most:
SELECT COUNT(remote_host), remote_host FROM connections WHERE connection_type = 'accept' GROUP BY remote_host ORDER BY COUNT(remote_host) DESC LIMIT 10;
COUNT(remote_host) remote_host 1655 10.204.202.23 420 10.2.101.193 234 10.246.93.128 224 10.208.119.223 120 10.54.151.201 120 10.129.95.105 120 10.174.16.255 120 10.234.207.36 120 10.133.39.52 120 10.31.104.74 how often did an attacker ask us to download a file
Count of parent connections remote ip address for each file offerd:
SELECT count(*), download_md5_hash, remote_host FROM connections NATURAL JOIN downloads GROUP BY download_md5_hash,remote_host ORDER BY download_md5_hash DESC
count(*) download_md5_hash remote_host 9 df51e3310ef609e908a6b487a28ac068 10.252.215.142 13 df51e3310ef609e908a6b487a28ac068 10.241.224.0 1 df51e3310ef609e908a6b487a28ac068 10.224.252.47 7 df51e3310ef609e908a6b487a28ac068 10.142.138.33 2 df51e3310ef609e908a6b487a28ac068 10.145.185.98 1 df51e3310ef609e908a6b487a28ac068 10.18.166.63 2 df51e3310ef609e908a6b487a28ac068 10.87.54.69 3 df51e3310ef609e908a6b487a28ac068 10.179.162.134 1 a8640b9698542689880b7619dbde1293 10.72.12.70 2 6e0a868563ee18fe6569796c7ce169be 10.92.48.170 1 6e0a868563ee18fe6569796c7ce169be 10.234.235.205 1 6e0a868563ee18fe6569796c7ce169be 10.10.88.64 1 6e0a868563ee18fe6569796c7ce169be 10.1.82.35 1 6c9d56535401bbd9052f499d05816ced 10.184.18.233 1 4f4f2ebc395b0fbea708acfc3930ec6d 10.193.184.162 1 4f4f2ebc395b0fbea708acfc3930ec6d 10.214.246.53 1 4f4f2ebc395b0fbea708acfc3930ec6d 10.221.108.13 6 32b0a00ffb8799a7782a19ee99a0214f 10.224.252.47 1 329c9d3163d9bf8f4bb9f0d1b846b6aa 10.179.215.73 1 329c9d3163d9bf8f4bb9f0d1b846b6aa 10.141.47.91 1 329c9d3163d9bf8f4bb9f0d1b846b6aa 10.76.228.187 3 2fa0e36b36382b74e6e6a437ad664a80 10.197.169.135 1 27f96b0496502104cf0bfc01391e1408 10.29.109.144 2 1d419d615dbe5a238bbaa569b3829a23 10.65.34.231 1 14a09a48ad23fe0ea5a180bee8cb750a 10.65.34.231 2 14a09a48ad23fe0ea5a180bee8cb750a 10.29.109.144 4 14a09a48ad23fe0ea5a180bee8cb750a 10.241.224.0 1 14a09a48ad23fe0ea5a180bee8cb750a 10.224.252.47 2 14a09a48ad23fe0ea5a180bee8cb750a 10.142.138.33 3 14a09a48ad23fe0ea5a180bee8cb750a 10.193.67.162 2 14a09a48ad23fe0ea5a180bee8cb750a 10.145.185.98 3 14a09a48ad23fe0ea5a180bee8cb750a 10.87.54.69 1 14a09a48ad23fe0ea5a180bee8cb750a 10.179.162.134 internal bugfixing
Due to a bug in the logging (I'll take care, just to prevent complains for early adaptors), the next query required a fix:
UPDATE connections SET connection_tree = connection WHERE connection_tree IS NULL
distinct attackers per file
Number of distinct attackers per distinct malware file:
SELECT download_md5_hash, COUNT(DISTINCT tree.remote_host) FROM downloads NATURAL JOIN connections AS parent JOIN connections AS tree ON (parent.connection == tree.connection_tree) GROUP BY download_md5_hash;
download_md5_hash COUNT(DISTINCT tree.remote_host) 14a09a48ad23fe0ea5a180bee8cb750a 1 27f96b0496502104cf0bfc01391e1408 1 2fa0e36b36382b74e6e6a437ad664a80 1 329c9d3163d9bf8f4bb9f0d1b846b6aa 3 32b0a00ffb8799a7782a19ee99a0214f 1 4f4f2ebc395b0fbea708acfc3930ec6d 3 6c9d56535401bbd9052f499d05816ced 1 6e0a868563ee18fe6569796c7ce169be 4 a8640b9698542689880b7619dbde1293 1 df51e3310ef609e908a6b487a28ac068 8 multiple infections/attacking hosts offering different malware files
Now, lets see if we have a host offering more than one malicious file:
SELECT tree.remote_host, COUNT(DISTINCT download_md5_hash) FROM downloads NATURAL JOIN connections AS parent JOIN connections AS tree ON (parent.connection == tree.connection_tree) GROUP BY tree.remote_host HAVING COUNT(DISTINCT download_md5_hash) > 1;
remote_host COUNT(DISTINCT download_md5_hash) 10.224.252.47 2
We have one, lets see which files …
SELECT DISTINCT download_url, offer_url, download_md5_hash FROM downloads NATURAL JOIN offers NATURAL JOIN connections AS parent JOIN connections AS tree ON (parent.connection == tree.connection_tree) WHERE tree.remote_host = '10.224.252.47';
download_url offer_url download_md5_hash fxp://10.224.252.47/ssms.exe tfxp://10.224.252.47/ssms.exe 32b0a00ffb8799a7782a19ee99a0214f fxp://10.224.252.47/ssms.exe tfxp://10.224.252.47/ssms.exe df51e3310ef609e908a6b487a28ac068
Obviously there is a bug, reporting download urls as ftp urls, when they were tftp urls initially … But, the 2 different files come from the same location, so it is likely the tftp transfer broke. popular download locations
Number of downloads by location:
SELECT COUNT(*), download_url FROM downloads GROUP BY download_url ORDER BY COUNT(*) DESC;
As mentioned previously, tftp downloads get reported as ftp downloads … COUNT(*) download_url 13 fxp:://10.241.224.0/ssms.exe 9 fxp:://10.252.215.142/ssms.exe 7 fxp:://10.224.252.47/ssms.exe 7 fxp:://10.142.138.33/ssms.exe 5 hxxp:://10.181.184.21/i/1i2.zip 3 fxp:://10.197.169.135/ssms.exe 3 fxp:://10.179.162.134/ssms.exe 3 fxp:://1:1@10.241.224.0:64459/ssms.exe 3 fxp:://1:1@10.87.54.69:42751/ssms.exe 3 hxxp:://zonetech.info/58.exe 3 hxxp:://zonetech.info/61.exe 2 fxp:://10.145.185.98/ssms.exe 2 fxp:://10.87.54.69/ssms.exe 2 fxp:://1:1@10.65.34.231:8218/ssms.exe 2 fxp:://1:1@10.193.67.162:61062/ssms.exe 1 fxp:://10.65.34.231/ssms.exe 1 fxp:://10.29.109.144/ssms.exe 1 fxp:://10.18.166.63/ssms.exe 1 fxp:://1:1@10.29.109.144:52275/ssms.exe 1 fxp:://1:1@10.29.109.144:53336/ssms.exe 1 fxp:://1:1@10.241.224.0:43153/ssms.exe 1 fxp:://1:1@10.224.252.47:38794/ssms.exe 1 fxp:://1:1@10.142.138.33:12654/ssms.exe 1 fxp:://1:1@10.142.138.33:64375/ssms.exe 1 fxp:://1:1@10.193.67.162:41851/ssms.exe 1 fxp:://1:1@10.145.185.98:10721/ssms.exe 1 fxp:://1:1@10.145.185.98:16110/ssms.exe 1 fxp:://1:1@10.179.162.134:10901/ssms.exe 1 hxxp:://gutgal.com/sh.php 1 hxxp:://privcash.cc/r2.exe addressed dcerpc calls
Interested which dcerpc calls get attacked most?
SELECT COUNT(*), dcerpc_uuid, dcerpc_opnum FROM dcerpcs GROUP BY dcerpc_uuid, dcerpc_opnum ORDER BY COUNT(*) DESC;
COUNT(*) dcerpc_uuid dcerpc_opnum 1002 12345778-1234-abcd-ef00-0123456789ac 62 511 3919286a-b10c-11d0-9ba8-00c04fd92ef5 9 131 000001a0-0000-0000-c000-000000000046 4 34 4d9f4ab8-7d1c-11cf-861e-0020af6e7c57 0 30 367abb81-9844-35f1-ad32-98f038001003 27 21 a0010000-0000-0000-c000-000000000046 4 12 4b324fc8-1670-01d3-1278-5a47bf6ee188 31 11 78573412-3412-cdab-ef00-0123456789ac 62 4 8d9f4e40-a03d-11ce-8f69-08003e30051b 54 1 6a281939-0cb1-d011-9ba8-00c04fd92ef5 9 1 c84f324b-7016-d301-1278-5a47bf6ee188 31 fingerprints
Using p0f, dionaea can collect fingerprints for incoming attacks, lets play with the data … operating system genre
Count of operating system genre:
SELECT COUNT(*), p0f_genre FROM p0fs GROUP BY p0f_genre ORDER BY COUNT(*) DESC;
COUNT(*) p0f_genre 4491 Windows 419 3 Linux
Windows dominates, 10% unknown … operating system
Split by operating system version:
SELECT COUNT(*), p0f_genre, p0f_detail FROM p0fs GROUP BY p0f_genre, p0f_detail ORDER BY COUNT(*) DESC;
COUNT(*) p0f_genre p0f_detail 1539 Windows XP/2000 (RFC1323+, w+, tstamp-) 1457 Windows 2000 SP2+, XP SP1+ (seldom 98) 709 Windows XP SP1+, 2000 SP3 615 Windows 2000 SP4, XP SP1+ 419 120 Windows XP/2000 (RFC1323+, w, tstamp-) 49 Windows 2000 SP4, XP SP1+ (2) 3 Linux 2.6 (newer, 3) 2 Windows 2003 (1)
Obviously we are missing Windows Vista and maybe even Windows 7, I'm sure at least Vista got it's cut on the attacks, but as p0f fingerprints are rather outdated, there is nothing I can do about it. But, once there are more recent fingerprints, identifying Vista and Windows 7, they will show up. ports attacked from linux hosts
But, lets see which ports got hit by the Linux hosts:
SELECT COUNT(*), local_port FROM connections NATURAL JOIN p0fs WHERE p0f_genre = 'Linux' GROUP BY local_port;
COUNT(*) local_port 3 80
So, something was accessing the webserver on port 80. attacks are trees
Given the possibility to improve a horrible situation from the beginning, I wanted to be able to correlate a malware download with the initiating connection. Unfortunately there can be more than one connection involved before downloading the malware, as an example take an exploit, which spawns a bindshell, which accepts a connection, which triggers a download via ftp. So, you have at least 3 connections, and all belong to each other hierarchical,
connection to exploitable service #1 bindshell listening #2 (parent #1) bindshell accept #3 (parent #2) malware offer (parent #3) malware download (parent #3)
So, as I wanted to store the information in rdbms, namely SQLite, I knew rdbms do not support hierarchical structures that good, you can store information hierarchical, but querying is hard. For example to get the initiating connection for the malware download in previous example you'd have to walk the parents until there is no parent left. Thats no real problem, if you can access the database with a cursor, but SQLite does not support cursors. Therefore, it sounded reasonable to store the id of the upper-most parent for each connection.
connection to exploitable service #1 bindshell listening #2 (parent #1, tree #1) bindshell accept #3 (parent #2, tree #1) malware offer (parent #3, tree #1) malware download (parent #3, tree #1)
I'll try to explain how to accomplish this in python.
The first snippet resolves a python dbi cursor result, so you get a list of dicts, and can access the items by name instead of an index. It will break if you have multiple columns with the same name, but as long as you take care of forming your queries well, it works fine.
def resolve_result(resultcursor): names = [resultcursor.description[x][0] for x in range(len(resultcursor.description))] resolvedresult = [ dict(zip(names, i)) for i in resultcursor] return resolvedresult
Next, some formatting to print a connection, depending on type:
def print_connection(c, indent): if c['connection_type'] == 'accept': print("%*s connection %i %s %s %s %s:%i <- %s:%i" % ( indent, " ", c['connection'], c['connection_protocol'], c['connection_transport'], c['connection_type'], c['local_host'], c['local_port'], c['remote_host'], c['remote_port']) ) elif c['connection_type'] == 'connect': print("%*s connection %i %s %s %s %s:%i -> %s/%s:%i" % ( indent, " ", c['connection'], c['connection_protocol'], c['connection_transport'], c['connection_type'], c['local_host'], c['local_port'], c['remote_hostname'], c['remote_host'], c['remote_port']) ) elif c['connection_type'] == 'listen': print("%*s connection %i %s %s %s %s:%i" % ( indent, " ", c['connection'], c['connection_protocol'], c['connection_transport'], c['connection_type'], c['local_host'], c['local_port']) )
Now, we open the database and retrieve the root-connections. Root-connections initiate an attack, so they have no parent-connection.
dbh = sqlite3.connect("/tmp/test.sqlite") cursor = dbh.cursor()
result = cursor.execute("SELECT * from connections WHERE connection_tree = connection OR connection_tree IS NULL ") connections = resolve_result(result)
Now we can iterate through the connections, and print each connections information:
for c in connections: connection = c['connection'] print_connection(c, 1)
But we want to print the child-connections too, and the possible child-child-connection, and possible child-child-child …, so we will use recursion:
def recursive_print(cursor, connection, indent): result = cursor.execute("SELECT * from connections WHERE connection_parent = ?", (connection, )) connections = resolve_result(result) for c in connections: if c['connection'] == connection: continue print_connection(c, indent) recursive_print(cursor, c['connection'], indent+2)
and change the loop on all root connections to print the child-connections too:
for c in connections: connection = c['connection'] print_connection(c, 1) recursive_print(cursor, c['connection'], 2)
This already gives good results:
connection 610 smbd tcp accept 10.69.53.52:445 <- 10.65.34.231:2010 connection 611 remoteshell tcp listen 10.69.53.52:1957 connection 612 remoteshell tcp accept 10.69.53.52:1957 <- 10.65.34.231:2135 connection 613 ftpctrl tcp connect 10.69.53.52:37065 -> 10.65.34.231/None:8218 connection 614 ftpdata tcp listen 10.69.53.52:62087 connection 615 ftpdata tcp accept 10.69.53.52:62087 <- 10.65.34.231:2308
After adding code to print all information we have for each connection, it looks like this:
connection 610 smbd tcp accept 10.69.53.52:445 <- 10.65.34.231:2010 dcerpc request: uuid '3919286a-b10c-11d0-9ba8-00c04fd92ef5' opnum 9 p0f: genre:'Windows' detail:'XP SP1+, 2000 SP3' uptime:'-1' tos: dist:'11' nat:'0' fw:'0' profile: [{'return': '0x7c802367', 'args': [, 'CreateProcessA'], 'call': 'GetProcAddress'}, ...., {'return': '0', 'args': ['0'], 'call': 'ExitThread'}] service: bindshell://1957 connection 611 remoteshell tcp listen 10.69.53.52:1957 connection 612 remoteshell tcp accept 10.69.53.52:1957 <- 10.65.34.231:2135 p0f: genre:'Windows' detail:'XP SP1+, 2000 SP3' uptime:'-1' tos: dist:'11' nat:'0' fw:'0' offer: fxp://1:1@10.65.34.231:8218/ssms.exe download: 1d419d615dbe5a238bbaa569b3829a23 fxp://1:1@10.65.34.231:8218/ssms.exe connection 613 ftpctrl tcp connect 10.69.53.52:37065 -> 10.65.34.231/None:8218 connection 614 ftpdata tcp listen 10.69.53.52:62087 connection 615 ftpdata tcp accept 10.69.53.52:62087 <- 10.65.34.231:2308 p0f: genre:'Windows' detail:'XP SP1+, 2000 SP3' uptime:'-1' tos: dist:'11' nat:'0' fw:'0'
So, if you want to have a gui for the honeypot, you could use the internal http service and script the webinterface in python, within the honeypot itself, or create static html pages using a cron job and serve them, or push them somewhere else.