Introduction
Recently, I had to play detective in our AWS account.
A resource was there (a Cognito User Pool), plain as day, but nobody could remember where it came from. And of course, there were no tags to help us (if only they had read my post on why tagging your AWS resources is a must).
My mission, should I choose to accept it: find out who created it. The problem? The event was about four months old.
My first instinct was to turn to AWS CloudTrail. And there, I hit my first wall: the event history is only viewable for the last 90 days. Dead end.
Luckily, I knew our CloudTrail logs were archived in an S3 bucket. My first thought was to manually download the archives for the right month, unzip dozens of JSON files, and hit Ctrl+F while praying for a miracle. Let’s just say it was neither efficient, pleasant, nor fast.
So, I wondered if there wasn’t a simpler way to search through this pile of logs, and I finally found the perfect solution: AWS Athena.
Querying your S3 logs with Athena
For those unfamiliar, AWS Athena is an interactive query service that makes it easy to analyze data directly in Amazon S3 using standard SQL. Basically, you can run queries on files (JSON, CSV, etc.) as if they were in a traditional database. No more downloading anything!
The idea, then, is to “map” our CloudTrail logs stored in S3 to a table in Athena. To do this, we use a single CREATE EXTERNAL TABLE
query. Following the AWS documentation on the subject, I ran the following query in the Athena console.
This query creates a table and uses a very handy feature called “partition projection.” This allows Athena to infer the location of the logs based on the date, without having to manually manage partitions. This is very convenient when the structure is standardized, as is the case with AWS CloudTrail.
|
|
Warning: Don’t forget to replace the s3://...
URLs with the exact path to your S3 bucket where your CloudTrail logs are stored, and to adjust the projection.timestamp.range
property to the period you’re interested in.
Investigation Time: Finding the Information
Once the table is created (which only takes a few seconds), the hardest part is over! My investigation could finally begin.
I was looking for who had created a Cognito UserPoolClient
on a specific date. So my SQL query looked like this:
|
|
In just a few seconds, Athena scanned the logs for the requested day and returned the result.
I had the exact time, the event, and most importantly, the ARN of the user who performed the action. Mission accomplished!
The Verdict… and the Culprit
The funniest part of this story?
After setting up this solution and finding the information so easily, I discovered that the “culprit” who had created this resource four months ago… was me. I had completely forgotten!
Jokes aside, this experience confirmed one thing for me: taking a few minutes to set up Athena on your CloudTrail logs is an incredibly worthwhile investment. You’re giving yourself a long-term auditing and search capability that will save you hours of manual searching the day you really need it. Don’t be like me; don’t wait until you’re stuck to set it up!