Wildcard domain serverless applications

Standard

Adding Licenses to Code Projects

I’ve been starting to get back into personal programming projects, especially as I get more familiar with Python and NodeJS. Hoping that some of the stuff I create is actually useful, I wanted to make sure it was appropriately licensed. GitHub makes it easy to attach a license when creating a new repository, but sometimes (a) I forget to set a license when starting, (b) maybe I’m not immediately sure what license to pick or (c) decide to change the license later.

None of these problems are particularly difficult to manage, but I wanted an easier solution. I stumbled across https://mit-license.org, which is a cool little PHP/Heroku project that hosts online versions of the MIT license. Fork the repo, add your user JSON doc, submit a pull request, and you’ve got a web license you can just provide a URL for in your LICENSE file. Pretty cool.

But the repo maintainer states it’s $7 per month for hosting. In the age of serverless, that’s insanity. Especially considering how infrequent requests to that site ought to be. So I thought, “why can’t this be serverless, and reduce that cost?”

So I rewrote the whole thing in Python, created a Lambda function, and hosted it with CloudFront and API Gateway.

https://github.com/angrychimp/code-license

The whole thing is backed by a DynamoDB table, which is a bit nicer since you can’t scrape email addresses off the JSON files anymore. I also added some options to allow for other licenses to be specified (as of this writing, I’ve added MIT and Apache-2.0, but more will soon follow).

User-based wildcard CNAMEs

One of the hardest problems I had to solve was mimicking the user-subdomain concept. After getting the code all written up and setting up my API Gateway, I discovered that while you can set a custom CNAME for an API Gateway, it doesn’t support wildcard CNAMEs. I’m not entirely sure why, but I’ve got to believe someone has reasons. (Maybe the API Gateway team hasn’t caught up to wildcard/SNI support which CloudFront added a few years ago?)

After a day and a half of brainstorming, I came up with a solution. Using CloudFront, I was able to write a very small and straight-forward Lambda@Edge function that rewrites a request URI. So if you type “https://angrychimp.code-license.org” into your address bar, CloudFront rewrites that to “https://{api-gateway-domain}/u/angrychimp”. Other URI parts are preserved so that we can provide the correct content. Then I cache the response for a week, since it should be largely static.

This isn’t an ideal implementation, since I’m still running the Lambda@Edge function on each request (maybe I’ll go back and see if I can adjust it to an Origin Request function, rather than a Viewer Request function, which will allow me to improve caching and reduce my function invocation frequency), but for now this is way better.

End result?

In the end, whether or not this translates to actual cost savings depends largely on request frequency. I had asked the other package maintainer what the monthly request volume looked like, but didn’t wait for a response before starting my project.

My next steps are to implement a process to request new user records (though I need to make sure I allow for a simple approval process, and avoid spam/bot requests), and figure out a way to allow people to update their existing registrations. Once done, I’ll throw this out on Reddit and see what happens. I think I can handle 10-50K requests monthly and still keep things under $7, since a lot of this should happen under the AWS free tier. We’ll see how it goes.

Then I need to figure out how to implement CI/CD on the code. It’ll be hard with the CloudFront piece, but AWS SAM provides some cool automated testing tools, and I’ve been looking for a fun project to use for learning how SAM works.

More Alfred/Screen Sharing fun

Standard

Following the last post, I’ll comment that I use Screen Sharing very regularly. Each time I open it in macOS 10.13, I need to start typing before it auto-completes my last session. To save a few seconds, I wrote a quick Alfred workflow that scans my recent Screen Sharing profile history (located in a user Library folder) and pattern matches as I type.

Recent Screen Sharing Profile.alfredworkflow

Now I can just open Alfred, type “screen + <space> + <keyword>” and it’ll bring up matching profiles and I can connect directly. Since Screen Sharing uses VNC fallback, I can also use this to quickly connect to my Raspberry Pi, the wife’s laptop, or anything else I might need.

Making an Athena table from the SecLists repo

Standard

If you’re into web security, you have hopefully heard of SecLists. It’s an amazing repository of keywords, indicators, payloads, passwords and more. It’s great not just for SecOps, but also developers and QA who want to step up their security game.

As part of a project I’m working on, I wanted to be able to quickly compare strings in the Discovery/Web_Content files against logs I have regularly synched to AWS S3 (specifically, ELB logs for my SaaS platform). In order to find interesting data in those logs, I’ve already created Athena tables, so I just need a new table for this content. So I wrote a quick script that fetches the SecLists repo, copies it up to S3, then generates an Athena table.

This gist shows how to make the whole repo searchable, but it’s worth noting that there are README files and other content in there you don’t want to query (including GIFs and other binaries). So it’s a good idea to restrict your queries to subfolders using the $path metavariable, or CREATE your table using that subfolder in the LOCATION path. (For example, since I’m only interested in web content, I gave that full path in my CREATE TABLE statement.)

What’s rad about this is that (a) it’s searchable using standard SQL, (b) I can compare strings to other data files using Athena, and (c) I only incur access/query charges when I run my queries, rather than having an always-on database instance.

Let me know on Twitter what you’re using Athena for!

 

Detecting public-read S3 buckets

Standard

I’m kind of surprised I couldn’t easily find something like this elsewhere. After all the recent news about unsecured (or very poorly secured) AWS S3 buckets, I wanted to find a quick and easy way of checking my own buckets. Between the several AWS accounts I manage, there are hundreds.

AWS sent out an email to account owners listing unsecured buckets a while back. Read more about it from A Cloud Guru, where they also discuss how to secure your buckets. But that doesn’t necessarily help with quick auditing. AWS provides some tools like Inspector to help find issues, but setting it up can take some time (though it’s totally worthwhile in the long run). I’m impatient, and I want to know stuff right now.

My solution was to write a quick script that scans my buckets for glaring issues. Namely, I want to know if any of my buckets have the READ permission set for “everyone” or “any AWS account”. If READ is allowed for “everyone” – anyone can list or download files in that bucket. If it’s allowed for “any AWS account”, a trivial barrier is set – a user just has to have an AWS account to review your bucket contents.

So here’s my script.

It requires the AWS CLI and jq, which is an awesome utility, and can be downloaded here. It’ll check top-level bucket ACLs for public-read settings, and just alert you to those bucket names. From there, I’ll leave it to you to secure your buckets.

If you just want to take the nuclear option and update your buckets to private-only, you can do that with this AWS CLI command:

aws s3api put-bucket-acl --bucket <bucket-name> --acl private

Just no one go breaking prod, please.

Randomly redistributing files with bash

Standard

This was was damn confusing, but the solution is absurdly obvious. I needed to relocate a large number of files from a single source into a collection of subfolders. (These subfolders were essentially worker queues, so I wanted a roughly even distribution every time new files appeared in the source folder.)

But I noticed that every time this executed, all my source files were ending up in the same queue folder (and not being evenly distributed). What gives?

Turns out, my call to $RANDOM was being executed only once at runtime, so that value was being set statically, and used for all subsequent mv commands. The Eureka moment what when I realized that as a subshell command, I need to escape my dollar-signs so that they’d be ignored by the parent shell, and translated by the child.

I suddenly found all my files going to the correct folders. Yet another reminder to always keep scope in mind.