Recently, Amazon AWS announced availability of Access Logs for Elastic Load Balancers. One of the key points from the announcement was that by selecting a simple checkbox the Access Logs will magically appear in an S3 bucket of your choice, neatly organized according to time, with uniques where necessary to avoid conflicts:
With file names looking something like:
The contents of the files? For HTTP connections, every line is:
<ISO-8601 date> <bucket-name> <requesting-ip:port> .. blah blah.. "GET http://<hostname>:<port>/?QUERY_STRING HTTP/1.1"
That’s interesting. And how long can that query string in the log be? I ran a little test using a Node.js script. The test script ran out of memory after successfully making a request with over 16,777,216 characters in the query string. And by “successfully”, I mean that I had a log file delivered to my S3 bucket with that query string present in it.
Are you thinking what I’m thinking?
If one sets up an AWS Elastic Load Balancer without any instances backing it, it will return 503 errors (Service Unavailable). But.. *drumroll* it will log those attempts… with the query string included. Log collection infrastructure just became a whole lot simpler.
If you’re interested in testing this out, I built a thing: drifter-rsyslog. It is an rsyslog Node.js plugin that takes messages via stdin and turns them into HTTP GET requests with the message URI encoded in the query string (it’s brand new, probably with bugs, but it’s a start). If you don’t like Node.js, it should be straightforward to build something similar.
Happy logging! and if you’d like me to setup this sort of collection infrastructure for you, let me know.