A Blog About Anime, Code, and Pr0 H4x

Building the Better Brute Force Algorithm - A Guide to Heuristic Password Cracking

July 23, 2012 at 12:00 PM

As seen in the Summer 2012 edition of 2600: The Hacker Quarterly.

Let's begin with a brief overview of standard (non-decryption based) password cracking methods. When faced with the task of cracking a password hash, and reverse engineering the encryption algorithm used to create the hash of the original password isn't a viable choice, you are left with a couple of different options.

Dictionary Attacks

A dictionary attack is carried out by iterating through a list of words, creating a hash of each one, and comparing the result to your target hash until you find a match. While dictionary based cracking methods can be used to crack a lot of common passwords, they're not all that effective when things start to become more complicated.

Often people use one or more words in their password, and those words may or may not be separated by a space or contain numbers or punctuation. Take for example, the password "falconpunch". It contains the two words "falcon" and "punch". Both of these words are found in a standard list of dictionary words, but you wouldn't be able to crack this password using a dictionary attack, because they're mashed together to form a single 'word'.

We also run into the same issue with passwords that are made up of, or include, portmanteaus that are not generally found in dictionary lists. Take for instance the portmanteau made from combining the words "char" and "lizard". Chances are pretty good that your dictionary list doesn't include the word "charizard", thus rendering a dictionary attack not very effective for that password.

Brute Force Attacks

When dictionary attacks just won't do, you can always try cracking the password using the brute force method. A brute force attack is carried out by cycling through every possible combination of a sequence of characters (aaaa, aaab, aaac, etc), and hashing each one until you find the sequence whose hash matches your target hash.

The benefit of using a brute force attack, is that it has a 100% chance of success to crack your password hash. The downside though is that to iterate through every possible combination of a sequence of characters until you find a match for your target hash could end up taking a very long time, and it only gets worse the longer and more complicated your target password is.

Take for example the password "charizard rules". That's a password that is 15 characters long, and only contains letters and spaces. Simple right? Well in order to crack this password using a brute force attack, you would need to iterate through every possible combination of the letters a-z and the space (" ") character from a length of one to at least fifteen characters. Let's do some math to see just how many combinations (in theory) we would have to try.

Characters

Total Combinations

1 (27^1)

27

2 (27^2)

729

3 (27^3)

19,683

4 (27^4)

531,441

5 (27^5)

14,348,907

6 (27^6)

387,420,489

7 (27^7)

10,460,353,203

8 (27^8)

282,429,536,481

9 (27^9)

7,625,597,484,987

10 (27^10)

205,891,132,094,649

11 (27^11)

5,559,060,566,555,520

12 (27^12)

150,094,635,296,999,000

13 (27^13)

4,052,555,153,018,980,000

14 (27^14)

109,418,989,131,512,000,000

15 (27^15)

2,950,000,000,000,000,000,000

That's a total of 3,067,940,118,341,250,379,359 combinations.
At a rate of testing 50 hashes per second, it would take about 1,945,674,859,424 years to try every possible combination. That's almost two trillion years!

Now I'm not sure about you, but waiting a couple trillion years to crack someone's password sucks. So let's try cutting down that brute forcing time, by operating under the assumption that our target password has to be at least seven characters long. That means that we only need to calculate the combinations of the letters a-z and the space (" ") from a length of seven characters to at least fifteen characters. Let's see how our math looks now.

Characters

Total Combinations

7 (27^7)

10,460,353,203

8 (27^8)

282,429,536,481

9 (27^9)

7,625,597,484,987

10 (27^10)

205,891,132,094,649

11 (27^11)

5,559,060,566,555,520

12 (27^12)

150,094,635,296,999,000

13 (27^13)

4,052,555,153,018,980,000

14 (27^14)

109,418,989,131,512,000,000

15 (27^15)

2,950,000,000,000,000,000,000

That's a total of 3,067,940,118,340,848,058,083 combinations.
At a rate of testing 50 hashes per second, it would only take about 465,101,560,021 years to try every possible combination. Hooray, that's only 465 billion years which is a lot less time compared to two trillion years!

Conclusion

From the information above we've learned that dictionary attacks are nice, but aren't much help when trying to crack a password more complicated than something your grandma might come up with. (Aka. passwords consisting of more than just a single word or that include non-standard portmanteaus.) We've also learned that using a brute force attack will (eventually) crack any target password with a 100% rate of success, but it'll probably take a few eons to crack longer, more complicated passwords.

So what's a devilishly good looking super hacker to do? Why the answer is obvious. You need to use a smarter brute forcing algorithm!

Before We Continue

You may have noticed that I've conveniently forgotten to mention anything about passwords that include numbers or funky punctuation ($, &, @, etc). It's not that I don't acknowledge the existence of passwords like these, it's just that at this point we're only going to focus on passwords that use the letters a-z, spaces (" "), hyphens and underscores (- _), and common grammatical punctuation (. , ' ? !). I'll address all those kooky complicated passwords later on.

The Psychology of Password Creation

The primary issue with using a brute force attack to crack most real world passwords, is that they spend a ton of time comparing hashes generated from phrases like,

  • aaaaaaaaa
  • aaacccaad
  • xcv hjj abu
  • hhgdfgdrfg

Now these are all spiffy secure passwords, but you'd be hard pressed to find someone who would actually use a password like one of these. Granted there are a select group of paranoid types who I'm sure use passwords like these all the time, but more often than not people tend to pick passwords that they can actually remember; passwords that actually contain words. These words may not necessarily always be separated by spaces, be spelled correctly, or even appear in a dictionary, but they are all still at least words. They are phrases that can be read aloud, and phrases that people say aloud in their minds as they type them in.

Take for instance the words "falcon" and "punch", and all of the different ways they could be arranged to make a password. A few possible combinations would be "falcon punch", "falconpunch", "falcon punch!", "falconpunch!", and "falcon, punch!". Looking at these passwords, no matter how they're put together, the resulting password always includes the words "falcon" and "punch", and when you read it out loud, it's "falcon punch".

So in order to crack a 'regular password' (i.e. passwords that aren't random sequences of nonsense), we need a more heuristic brute forcing algorithm that doesn't waste it's time on unrealistic passwords like "aaaa" and "asdxcv", but instead focuses exclusively on generating passwords made up of words that adhere to the rules of English-like words and phrases.

What is an English-like Word or Phrase?

An English-like word is a word that may not necessarily be an actual English word, but still adheres to a series of rules that our brains use to determine whether or not a given sequence of characters qualifies as a "word". Therefore, an English-like phrase is a grouping of two or more English-like words that adhere to the rules that determine whether or not a group of words qualifies as a valid phrase.

Take for instance this article you're reading right now. As you take in each word, your brain is running a series of tests to make sure that the word you're looking at is actually a word, and not just a bunch of random letters. If I were to drop the word 'kguifdgj' in the middle of a sentence, your brain would automatically flag that word as not being a valid word because it doesn't follow the 'rules' of English-like words. That is, certain rules that every word follows in order to be considered a valid word by our brains. Therefore, you would conclude that that particular sentence was not a valid sentence because it wasn't made up entirely of valid words.

So in order to create a brute forcing algorithm that doesn't waste it's time on nonsense words like "sfdre" and "86ugkie65", we need to 'teach' it how to perform at least some of those same tests that our brain does for us automatically so that it can determine whether a word or phrase is valid or just gibberish. By doing this, we create a brute force algorithm that only generates possible passwords that an everyday person would potentially use. Which in turn drastically reduces the amount of time spent generating extremely unlikely possible passwords.

The Rules of English-like Words

Apostrophes

A word cannot include more than one apostrophe. If a word includes an apostrophe, the apostrophe can only be positioned as the last character, or second to last character, in the word. Moreover, an apostrophe's last bordering letter must be an s.

Examples:

  • chuck's is a valid word
  • chucks' is a valid word
  • chuck'x is not a valid word
  • ch'uks is not a valid word


Hyphens and Underscores

A word cannot include both hyphens and underscores. If a word includes hyphens or underscores, the word should be split at that punctuation, and each word should be tested independently for whether not it is a valid English-like word.

Examples:

  • snape-kills_dumbledore is not valid
  • lightsabersareawesome should be split at the underscores, and the the individual words should be tested separately to determine whether they are valid or not. If any of the individual words are invalid, then the entire word is invalid as well.

Ending Punctuation (! ? , .)

A word cannot include more than one instance of ending punctuation, and any occurrence of such punctuation can only be positioned as the last character of a word.


Other Punctuation (&, &, @, etc)

A word cannot include any instances of other punctuation.


Suffixes

If a word ends in a known suffix (ing, ist, scope, ology, etc), the last character before the suffix, cannot be the same as the first letter of the suffix.

Examples:

  • psychology is a valid word
  • psychoology is not a valid word

Note: There are a few words that don't follow this rule, like zoology. However since words like these are the (rare) exception to the rule, it's more effective to just ignore them.


Vowels

Words must include at least one vowel.

Character Repetition Patterns

The same character can never be repeated more than twice in a row.

Examples:

  • books is a valid word
  • boooks is not a valid word

The same sequence of characters can never be repeated more than twice in a row.

Examples:

  • mahimahi is a valid word
  • mahimahimahi is not a valid word

Character Position Analysis

One of the great things about computers is that they're very good at performing simple tasks over and over really fast. Because of this trait, there are certain tests we can have the computer perform to validate words that wouldn't be efficient if you were verifying a word by hand, one such test is Character Position Analysis.

A Character Position Analysis is a test performed by iterating through each character in a word, and analyzing that character's relationship with its neighboring characters in order to determine whether or not certain characters 'fit' next to each other.

To perform a Character Position Analysis, you first need to build a database that documents how often characters appears directly next to, or one character apart from, each other. This database is broken up Into three separate tables that keep track of occurrence patterns for:

  • the first three characters of a word (starters table)
  • the last three characters of a word (enders table)
  • and the characters in a word as a whole (neighbors table).

Below is an example table documenting the overall character occurrence patterns for the word "awesome". Each cell holds two numbers. The first number represents the number of times a character appears directly next to another character, and the second number represents the number of times a character appears one character apart from another character.

A

E

M

O

S

W

A

0, 0

0, 1

0, 0

0, 0

0, 0

1, 0

E

0, 1

0, 0

1, 0

0, 1

1, 0

1, 0

M

0, 0

1, 0

0, 0

1, 0

0, 1

0, 0

O

0, 0

0, 1

0, 0

0, 0

1, 0

0, 0

S

0, 0

1, 0

0, 1

1, 0

0, 0

0, 1

W

1, 0

1, 0

0, 0

0, 0

0, 1

0, 0

From the data in the table above, we can conclude that the letters 'a', 'e', 'm', 'o', and 's' never appear directly after the letter 'a'. We can then use this data to verify whether or not other words are valid. For example the word "amber" would be considered not valid, because the letter 'm' appears directly after the letter 'a' which our occurrence patterns table tells us isn't possible.

Well obviously 'amber' is a valid word (anyone who's seen Jurassic Park knows that), but the data we have in the table above says otherwise. So in order to perform an accurate Character Position Analysis, a very large list of words must be analyzed in order to build a useful set of character position occurrence tables. Such a list of words can be found here, http://www.bsdlover.cn/study/UnixTree/V7/usr/dict/words.html

Once you have a character position occurrence database, then you can perform a Character Position Analysis. A Character Position Analysis is broken up into three separate tests, and a word is only valid if it passes all three tests.


Starting Characters Position Analysis

This test is performed by taking the first three characters of a word, and checking in the starters table whether the occurrence count (aka neighbor score) for the first and second characters, or first and third characters is equal to zero. If either neighbor score is equal to zero, then the word is not valid.


Ending Characters Position Analysis

This test is performed by taking last three characters of a word, and checking in the enders table whether the occurrence count (aka neighbor score) for the third to last and second to last characters, or third to last and last characters is equal to zero. If either neighbor score is equal to zero, then the word is not valid.


General Character Position Analysis

This test is performed by iterating through each character in a word, and checking in the neighbors table whether the occurrence count (aka neighbor score) for each character being tested and the character next to it, and the character one character apart from the character being tested is equal to zero. If any of the neighbor scores is equal to zero, then the word is not valid.


Getting More Accurate Results

One way to get more accurate results when performing a Character Position Analysis, is to raise the minimum required neighbor score from zero, to a higher threshold.

Rules of English-like Phrases

Spaces

A valid English-like phrase cannot include any occurrence of three or more space characters (" ") in a row. On instances of two spaces in a row, the phrase should be split at the double space, and each sub-phrase should be tested separately. If any of the sub-phrases are not valid, then the entire phrase is not valid.

Examples:

  • "row row fight the power" is a valid phrase
  • "it's dangerous to go alone, take this" is not a valid phrase (because there are three spaces in a row)


Word Repetition

The same word can never be repeated more than three times in a valid English-like phrase.

Examples:

  • "row row fight the power" is a valid phrase
  • "row row row row your boat" is not a valid phrase


Phrase Ending Punctuation (! ? .)

Phrase ending punctuation may only appear at the end of a phrase.

Examples:

  • "zelda is so over powered in brawl!" is a valid phrase
  • "zelda is so! over powered in brawl!" is not a valid phrase


Commas

Commas may only appear at the end of words, and never at the end of a phrase.

Examples:

  • "charizard is cool, but so is blastoise" is a valid phrase
  • "cooking is so fun," is not a valid phrase


Words

In order for an English-like phrase to be considered valid, each word in the phrase must be a valid English-like word.

So what about those passwords that include numbers, or goofy punctuation?

While heuristic brute forcing algorithms are great for generating English like words and phrases, things begin to be a lot more complicated if your target password includes numbers or funky punctuation ($, &, @, etc).

Going back to the topic of the psychology of password creation, remember that in general people pick passwords that are actually made up of words. Keeping this in mind, we can reasonably conclude that passwords that include numbers and/or funky punctuation still follow this rule, but the word(s) in the password are obfuscated by these non-standard characters. Another point to consider for passwords that include numbers only, is that often they're just appended to the end of a password.

Take for instance the password "jalapeno". It's a valid English-like word, and using it as a base, you can obfuscate it with numbers and funky punctuation.

Examples:

  • "jalapen0", "jalap3no", "j414p3n0" - letters replaced with numbers
  • "jalapeno1", "jalapeno123" - numbers appended
  • "j@l@peno" - letters replaced with punctuation

All of the passwords above would not be considered valid English-like words, but under all of the numbers and punctuation, they actually are. So in order to crack passwords that include numbers and/or funky punctuation using a heuristic brute force algorithm, you need to develop a method that can mutate strings generated by the algorithm (making alterations like in the examples above) and then test all the password variants as well as the original password string against your target hash.

Are there any heuristic brute forcing programs available?

Why yes there are! I maintain a small proof-of-concept Ruby application/gem that implements heuristic brute forcing. See, http://github.com/jamespenguin/gentle-brute for more details.

Conclusion

Heuristic brute forcing provides hackers with the ability to crack long and complicated passwords using brute force style password cracking, while not wasting eons trying unrealistic passwords.

To illustrate my point, let's pit heuristic brute forcing against standard brute forcing to crack a five character password consisting of the letters a-z.

  • Using heuristic brute forcing (via the Ruby program above): 517,839 potential phrases
  • Using standard brute forcing: 11,881,376 potential phrases

That's 96% fewer phrases to try using heuristic brute forcing, compared to standard brute forcing!

Go to Page

Copyright © Brandon Smith
ORIGINAL CONTENT DO NOT STEAL KTHX