github.com-0xsha-sweetie-data_-_2020-02-24_07-37-49
Item Preview
Share or Embed This Item
Flag this item for
- Publication date
- 2020-02-24
This repo contains logstash of various honeypots
Sweetie data
This repo contains data of various honeypots mostly gathered with awsome t-pot! . What to know what malicious actors are up to? Do you believe data is the only source of truth?There you have it. Put on your sherlock hat and find the crime. This repo contains three months of data from 12/19 to 2/20.
Who can uses these data
- Security researchers
- Malware analysts
- Threat intelligence companies
- Universities
- Data scientists
- Anyone else interested
Motivation
This research was a side project mainly motivated by understanding the current state of attacks in the wild.But as an individual, I have minimal resources and time so, I can't afford to scale and maintain, so I decide to take the servers down and share the data with the community. ♥
How to use it
Folder structure
Here is the list of honeypots and analyzers used during this experiment.
- adbhoney
- cowrie
- dionaea
- elasticpot
- heralding
- medpot
- p0f
- suricata
- tanner
Each honeypot has a log folder. Most of the logs are JSON or SQLite. Some honeypots contain other data, such as sample files.
Payloads
As mentioned, some honeypots also collect files, for example, adbhoney and cowrie. You can find file archives in the root directory of each honeypot.
file samples:
```bash380c4553681d76dca812fd679068ff42645363cf3aef11afe036252051725c7a.raw: ELF 32-bit MSB executable, Motorola m68k, 68020, version 1 (SYSV), statically linked, stripped3c0ac166b8511744430f4869b744beeef873c9a3c857e8d6607262a8d156f796.raw: ELF 64-bit MSB executable, MIPS, MIPS64 version 1 (SYSV), statically linked, stripped590dbe0f8c6977d808cdc66d6e46cb6579c0d42d520a74c8a27210d3b97d9930.raw: ELF 32-bit MSB executable, SPARC, version 1 (SYSV), statically linked, stripped608ee011537005f368c9731f4c4dee6a247b620cde52908ed0678df28c617971.raw: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=ba88e16fed564b3e4d7aba0787c6fbab52471e50, stripped615b1640e5ce651bfab71ee6be1244183ae244576a9eca3073dfe444eba072ad.raw: ELF 32-bit LSB executable, ARM, version 1 (ARM), statically linked, stripped63946c28efa919809c03be75a3937c4be80589a9df79cd1be72037d493b70857.raw: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=0c9b76185c23d668c7b4f1bdba94dfb94a9bed7a, stripped755286a4739343aa7f64227bcad34384df8d1602ac175b94a44068d51f237eb7.raw: ELF 32-bit LSB executable, MIPS, MIPS-I version 1 (SYSV), statically linked, stripped76ae6d577ba96b1c3a1de8b21c32a9faf6040f7e78d98269e0469d896c29dc64.raw: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=0af1f8be964f83d69ec4163415260349fa6cede8, stripped7a48c93c5cb63a09505a009260d1cca8203285e0c1c6ff5b0df9cbb470820865.raw: Java archive data (JAR)7a656791b445fff02ac6e9dd1081cc265db935476a9ee71139cb6aef52102e2b.raw: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=53abe9912786eea2bd09f4af4d634454777556e5, stripped9d8bf69ebedb94061469734f1486c0da01c1e566bf7be83ce3779aa1a0b54371.raw: ELF 32-bit LS
```
You can use VirusTotal API for bulkscan.
Visualation
Like T-pot you can use elastic stack and kiabana dashboard .
Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack so you can do anything from tracking query load to understanding the way requests flow through your apps.
It will do a fantastic job of making sense of these data, but at the same time, these data are too detail-oriented, so for the best results, you can have to role your-own analyzer.
Extra miles
For example, here is what I wrote to extract possible web application exploits from Suricata logsit uses pandas to read large JSON files then filter the data frame with an entry contain HTTP next. It will check if there is a file in url.
```python
(C) 2020 0xSha me@0xsha.io
#
$Id: suricatahttppath_filter.py Sun Feb 23 20:51:13 +07 2020 0xSha $
#
import pandas as pdimport json
def listtodict(lst): it = iter(lst) dicresult = dict(zip(it, it)) return dicresult
results = []df = pd.readjson('/data/suricata/log/eve.json',lines=True)filtereddf = df[df['http'].notnull()]
f = pd.DataFrame(filtereddf['http'])for i in f.iterrows(): if "url" in i[1].todict()['http']: if i[1].todict()['http']['url'] != "/": results.append(i[1].todict()['http'])
sorted_results = [sorted(d.items()) for d in results]
uniqueresults = list(map(json.loads,set(map(json.dumps, sortedresults))))
with open("/suricatahttppaths.json" , "w") as suricataouthttp: for item in uniqueresults: concatlist = [j for i in item for j in i] suricataouthttp.writelines( str( json.dumps(listtodict(concat_list) )))
```
The output is a cleaned JSON file. Here is an example of an exciting line.
```json{"hostname": "httpcontenttype": "text/html", "httpmethod": "GET", "httpport": 80, "httpuseragent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36", "length": 3348, "protocol": "HTTP/1.1", "status": 404, "url": "/index.php?s=/Index/\think\app/invokefunction&function=calluserfunc_array&vars[0]=md5&vars[1][]=HelloThinkPHP"}
```
As we see, we successfully extracted an exploit for thinkphp.
Findings
There are too much data and endless possibility of extraction and analysis, but here are a few things that come into my mind when I want to draw a conclusion.
- The number of malicious packets transferred a day is unbeliveble.
- Fortunately, a big chunk of malicious actors are script kiddies, but somehow they still score in 2020
- Very first computer attacks like brute forces are still a thing in 2020 when it comes to protocols like VNC and SQL SERVER.
- Mixing security, machine learning, and data science can bring "real" next-generation defense results.
How to contribute
- Add a pull request and share your logstash.
- Share it with whomever you belive can use it
- Do the extra work and share your findings with community ♥
References
Any ideas ?
- me [at] 0xsha.io
To restore the repository download the bundle
wget https://archive.org/download/github.com-0xsha-sweetie-data_-_2020-02-24_07-37-49/0xsha-sweetie-data_-_2020-02-24_07-37-49.bundle
and run: git clone 0xsha-sweetie-data_-_2020-02-24_07-37-49.bundle
Source: https://github.com/0xsha/sweetie-data
Uploader: 0xsha
Upload date: 2020-02-24
- Addeddate
- 2020-02-24 15:47:27
- Identifier
- github.com-0xsha-sweetie-data_-_2020-02-24_07-37-49
- Originalurl
-
https://github.com/0xsha/sweetie-data
- Pushed_date
- 2020-02-24 07:37:49
- Scanner
- Internet Archive Python library 1.8.1
- Uploaded_with
- iagitup - v1.6.2
- Year
- 2020