New codebase, spectre/meltdown and thousands of surprise hosts
15 January 2018
It’s done, the new code is live! It took me about two weeks to get it running properly and I saw some weird things a long the way that I’ll share here.
First, the code. Sequentially (“verticallly”) performing all actions per host works nicely. Because the bots do a lot of things in parallel, they still hit the flood detection on the infamous trigger happy firewall of a specific customer. That is now solved by pushing configs to the bots that set limits on the number of simultaneous scans, where possible grouped by subnet. Scaling is then done by simply adding more bots/nodes.
Al lot of technical debt is paid, and I fixed quite a few annoying little bugs. I know I shouldn’t, but I just could not resist to code some new functionality. Call it lack of discipline or maybe necessary for motivation, but the result is that ShadowTrackr now checks all your domains against logs of issued SSL certificates (Certificate Transparency logs). You get notified when a new certificate is issued, and all subdomains are pulled from the logs and checked when you enter a new domainname.
There is a ShadowTrackr test system, but it doesn’t have the scale of the real system. And besides that, there is always unexpected behaviour with real life internet data. When I released the new code in production, the systems took a major performance hit. I found some minor bugs, but CPUs still ran at 100%. Then I found out two problems unrelated to the code were causing this.
The first was
Spectre/Meltdown, and after resizing some machines (thank you cloud!) the situation improved. Something still felt off and after some digging I found that one client with a /16 subnet had about 4200 new hosts. These hosts all suddenly appeared and disappeared within just a few days. The hosts had no open ports, no associated urls and didn’t appear in any passive source. Also, the client had not installed 4200 new hosts. In the end we’ve removed them from the systems. Although we’re not sure where this came from, I suspect that some firewall or proxy had a little party and decided to answer a standard subnet SYN scan with RST packets for 4200 non-existent hosts.