Brand new API and SIEM feed
20 January 2018
The number of clients is steadily growing, but the first one will always remain special. They’ve been supportive from the start, provided valuable feedback and even put up with me when their timeline was full of crap again because the bots triggered the flood detection on the corporate firewall. So when they ask for a feed for their SIEM, they get a feed for their SIEM.
The SIEM in question, ArcSight, normally ingests messages in Common Event Format (
CEF) and this format is supported on quite a few security products. Their SIEM guy said that
JSON would also be fine. JSON is used a lot for APIs and goes well with the
Elastic Stack. That last one is popular with security people for a reason, and if you’re not familiar with it yet it should be your next stop after this. I decided to support both formats.
There are several options (see
API documentation) available to tweak your feed, but I figure the most common one will be to periodically pull all new events. This is as simple as hitting the endpoint url. The feed endpoint can remember what you already have pulled before and will only send you the difference since then.
I’m sure the API will expand over time, but if you have a cool idea and need something right now, just
contact me.
New codebase, spectre/meltdown and thousands of surprise hosts
15 January 2018
It’s done, the new code is live! It took me about two weeks to get it running properly and I saw some weird things a long the way that I’ll share here.
First, the code. Sequentially (“verticallly”) performing all actions per host works nicely. Because the bots do a lot of things in parallel, they still hit the flood detection on the infamous trigger happy firewall of a specific customer. That is now solved by pushing configs to the bots that set limits on the number of simultaneous scans, where possible grouped by subnet. Scaling is then done by simply adding more bots/nodes.
Al lot of technical debt is paid, and I fixed quite a few annoying little bugs. I know I shouldn’t, but I just could not resist to code some new functionality. Call it lack of discipline or maybe necessary for motivation, but the result is that ShadowTrackr now checks all your domains against logs of issued SSL certificates (Certificate Transparency logs). You get notified when a new certificate is issued, and all subdomains are pulled from the logs and checked when you enter a new domainname.
There is a ShadowTrackr test system, but it doesn’t have the scale of the real system. And besides that, there is always unexpected behaviour with real life internet data. When I released the new code in production, the systems took a major performance hit. I found some minor bugs, but CPUs still ran at 100%. Then I found out two problems unrelated to the code were causing this.
The first was
Spectre/Meltdown, and after resizing some machines (thank you cloud!) the situation improved. Something still felt off and after some digging I found that one client with a /16 subnet had about 4200 new hosts. These hosts all suddenly appeared and disappeared within just a few days. The hosts had no open ports, no associated urls and didn’t appear in any passive source. Also, the client had not installed 4200 new hosts. In the end we’ve removed them from the systems. Although we’re not sure where this came from, I suspect that some firewall or proxy had a little party and decided to answer a standard subnet SYN scan with RST packets for 4200 non-existent hosts.
The big vertical refactor
25 November 2017
At the request of some organisations I added some code to scan servers for a particular problem. These organisations are marked as critical infrastructure, and they often have access to material that is in a responsible disclosure procedure before it’s available to the general public. If you ever have this problem along with PoC code (any language will do), I'm interested :-)
I’m happy to do this of course, but the old flood detection problem reared its ugly head again. This problem often occurs with larger organisations that run multiple servers on the same subnet behind the same firewall. If the firewall has flood detection enabled, a ShadowTrackr node that hits multiple ips in the same subnet behind that firewall will be blocked. The blocking is usually only for 5 minutes, but that’s enough to generate a lot of useless messages on the timeline. It’s a bit like shitposting on Twitter.
So far I’ve done scans horizontally, just like
cencys (pdf alert). The solution for flood detection so far was a fancy algorithm that divided all checks and scans over the worker nodes in such a way that no node would ever hit multiple ips in the same subnet range within 5 seconds. 5 seconds are the default flood interval setting found in most big corporate firewalls. When adding more custom scans, this solution isn’t working anymore and the fancy algorithm will become a drag on the database while we’re scaling up too.
I’ve been wanting to refactor some parts of the code to offload more work from the database to the nodes anyway, and this seems like a good time to do it. This is what I’m spending most time on now, together with more vertically based host checks and scans. It’ll take some time to do properly and this means I’ll have put a hold on some other ideas for now. Bugs get priority of course, so please keep sending those.