Searching through logs

Recently, I needed to check out which endpoints in a specific version of our api were being hit/ if they were being hit at all. This involves searching through the paper trail logs, which was an interesting process so I thought I’d write about it.

First, I want to say, there is a better path than the one I took. It wasn’t until after I did all this log downloading and searching that I discovered the paper trail CLI. I haven’t had a chance to take a deep dive into it yet, but I assume using it is better than downloading 30 days of log files and than combining them into 1 log file.

Which is what I did first. I had to look at 30 days worth of logs, so I went to the paper trail heroku add-on and downloaded all 30 days worth of logs. Then I used cat *.tsv >merged.tsv to put all of the 30 files into 1 file. After that, I checked which endpoints I needed to look at by just looking at the routes file. Luckily, there are a pretty finite number of endpoints for the specific version I was looking at. I think, if I was dealing with significantly more endpoints, i’d need to figure out a better systems for ensuring that I’ve checked all the endpoints.

Then, for the simple endpoints (ie- /livestreams or /livestreams.json for the index endpoint or /livestreams/ for the show endpoints) I just ran a simple grep -c command which greps for that path and then counts it. For example cat FILE_PATH | grep -c /v1/livestreams.json.

When I got to a slightly more complex endpoint, for example, one that had a path like /v1/livestreams/something_unique_here/watched I couldn’t just grep for the endpoint, I needed to use a grep compatible regex to find the number of results that had “watched” at the end. I did this by searching cat FiLE_PATH | grep -c ‘/watched\b’ which looks only for the last bit of the url.

And that’s it. Pretty simple but I hadn’t handled searching through large log files or any sort of complex grepping before.
comments powered by Disqus