The company I work for probably doesn’t see as much traffic as Reddit, but we provide services via the web in the US and roughly 15 other countries. We make use of Akamai for CDN, security, etc. and one of the things they do is provide us with raw logs of every request made to our sites. That generates a lot of data that we feed into Splunk for analysis, debugging, etc.
One of the nicer things Akamai does in their logs is to classify if they believe the request came from a bot, and if so then what bot it was. They are able to identify over 1000 individual bots, and can also detect traffic from new/unknown bots. There is a LOT of bot activity on the internet these days, and many originate from cloud providers like AWS, where it’s clear it’s a machine making the request and not a human.
If we had a legal request for logs I’d have to look at the data to see how to respond. If Akamai showed a lot of bot activity from consumer ISP IPs then I’d likely include that data in an effort to show that end users may be victims of botnets. But if bot activity was mostly originating from cloud providers etc. then I probably wouldn’t include it. Let the lawyers try to figure out from the raw data what traffic originated from humans vs bots.