Randomly redistributing files with bash


This was was damn confusing, but the solution is absurdly obvious. I needed to relocate a large number of files from a single source into a collection of subfolders. (These subfolders were essentially worker queues, so I wanted a roughly even distribution every time new files appeared in the source folder.)

But I noticed that every time this executed, all my source files were ending up in the same queue folder (and not being evenly distributed). What gives?

Turns out, my call to $RANDOM was being executed only once at runtime, so that value was being set statically, and used for all subsequent mv commands. The Eureka moment what when I realized that as a subshell command, I need to escape my dollar-signs so that they’d be ignored by the parent shell, and translated by the child.

I suddenly found all my files going to the correct folders. Yet another reminder to always keep scope in mind.

Recovering from Apache “400 Bad Request” errors


You never want to encounter errors in your production environment. But what do you do if you release code that generates fatal errors outside your application?

A recent release we deployed caused an issue with links our platform was generating. Instead of nice links like this:
we started seeing links like:

The source of the issue was easy enough to track down. We use “%%” to wrap replacement variables for our email templates, and when a variable can’t be identified, we re-wrap the name with “%?”. (It’s arguable we shouldn’t do that, but that’s a topic for a later discussion.) Rolling back the change fixed the issue for future generated emails, but what to do about the old emails? They’re already in inboxes world-wide, and we can’t fix the URI for those links.

When Apache receives a request in this format, where you have a percent sign followed by a non-alphanumeric tuple, it panics. It’s thinking that the %-sign is there to URL-encode something. When you follow a %-sign with a question mark, it generates a parse error, and throws a “400 Bad Request”. You can’t use mod-rewrite because the error occurs before rewrite rules are processed. But Apache does give you one opportunity to rewrite history (so to speak): ErrorDocument.

We were able to quickly deploy a new script to production, and set the path to that script as the ErrorDocument handler for HTTP 400 responses.

# In /etc/httpd/conf.d/my-app.conf
ErrorDocument 400 /error/400.php

Apache will redirect the request to that page, with the URI intact. It’ll even still load your other modules, including (in this case) the PHP processor. So in that 400.php script, we parse the URI, look for the bad string, and simply rewrite (or remove) it.


if (strpos($_SERVER['REQUEST_URI'], '%?') !== false) {
    $uri = str_replace('%', '%25', $_SERVER['REQUEST_URI']);
    header('Location: ' . $uri);

echo "This server has encountered an error.";

Et voila. It’s not the perfect user experience, since those values weren’t replaced in transit like they should have been. But at least the email recipients clicking on the links aren’t getting standard “400 Bad Request” error pages. It was a tidy little solution to what could have been a devastating problem.

In response: Data Formats of Star Wars Suck


Sarah Jeong composed an entertaining review of data storage models in the Star Wars universe.


I get where she’s coming from, but I wonder if she has sufficient experience here from which to draw her conclusions. I know, I’m diving a bit too deep into fandom and both (a) trying to make sense of a fantasy movie franchise and (b) arguing on the internet (where there are no winners), but in defense of the artists making the movie, I think the data storage shown in Star Wars can be explained away.

Discussing a plot point in Rogue One wherein our heroes need to transmit Death Star plans to an orbiting Rebel fleet:

The information appears to be solely contained on a single data tape that, in order to be transmitted, has to be taken up to a giant antenna on the roof of the building. There are no terminals where they can just access the data.

This could be a security feature. If you’re in a secure data facility, why would you make it easy to transmit secure data directly from the archive room to an external location. I’m sure people in the data archives at the CIA can’t just log into Gmail, compose a new message, and fire off a couple of with PDF attachments. You need to get the data from the secure part of the building to someplace where you can transmit.

A key plot point in Rogue One is that the file size is so large that they need to commandeer a giant antenna and knock out a planetary shield in order to upload the files. But for some reason they can send regular communications just fine without doing either of those things?

What on earth is being stored on that magnetic tape cassette? Is it 5000 .bmp images loaded into slides in Powerpoint with accompanying animations? Why is DEATH_STAR_final_final__FINAL.dwg.doc.gif.pdf so big?

Ground forces couldn’t communicate with the orbital fleet once the shield was up. That was clearly pointed out in the film. And who ever said the data they need filled up the data tape?

Ask anyone that’s ever worked on a sizable project for large building where integrated technology is in-scope. I’ve only ever worked on single-floor office spaces. There’s a shit-ton of paperwork. There are requirements drafts, revisions and approvals. There are material manifests. Work schedules. New hire forms and termination records. Expense reports. A veritable cornucopia of minute details. I’d wager that material related to the Death Star probably took up a few large data drives, even assuming the compression seen on Rogue One is comparable to what we can assume existed in Episode II.

Rogue Squadron knew exactly what they were looking for – data related to a thermal exhaust port that Galen Erso built as a structural weakness. Assuming the data was properly indexed (which I personally think is where we need to most stretch our disbelief, and this is a science fiction movie), K2SO could have found the data they needed, and highlighted the specific tape that needed retrieval. Then it’s a matter of getting said tape, and getting the specific subset of plans to the Rebel fleet.

No one said they needed everything on that giant ass drive.

Count Dooku absconds with a thumb drive or something that contains the Death Star plans.

Again, that didn’t have to be the whole damn thing. Dooku could have just had a status report of some kind for Palpatine. No one ever said that was the whole thing.

Another big question is why Tarkin blew up Scarif. Archives exist for a reason – you don’t want to lose whatever data you put there. He must have been damn sure he wouldn’t get flayed for destroying not only unknown quantities of records, but also whatever other Imperial forces were still mopping up the clearly smaller Rebel assault force at the installation. Clearly layoffs in the Galactic Empire are much more final than where I work.

Why is it that I have to carry five dongles so my Macbook can play a PowerPoint presentation but a decades-old Rebel droid needs zero to stay interoperable with an enemy’s state-of-the-art battle station?

C’mon – this doesn’t take much imagination. Our little planet Earth has what, 7 billion people? And how many have modern computers? And how long as it taken for us to only begin to get rid of an audio jack? And when we happened we panicked. Government systems still have serial ports in many cases, for legacy peripherals. In a galaxy of trillions, it’s not unquestionable to think data ports will remain unchanged for a few decades, or that you’d put legacy ports on your “state of the art” battle station, especially if that battle station clearly took significant time to build in the first place. I doubt you’d find a lot of USB-C ports on any modern naval craft we’re launching in the next year or two.

But hey, secret rebel archivists makes sense too.