Where do you Hide a Needle?
If you answered in a haystack good for you because you understand the fundamental principle behind passwords. The next question is why does that work?
Before resorting to the haystack let’s do a little thought experiment and start by hiding the needle in a saucer and further let’s tell our nemesis Bob where we hid it. Start your stopwatch and we’ll see how long it takes Bob to find it. Humm… looks like that took exactly no time at all. Now let’s make things a little harder and hide the needle in a house and tell Bob which house. Ready with the stop watch and go! We see Bob searching the house, sweeping the floor, dragging a magnet over the carpet, ripping through drawers and after a week or so he is thoroughly frustrated and about to give up when he notices the needle pinning a boutonnière to a seersucker suit coat. This is a little better but we realize that we need to make it even harder for Bob. This time we hide the needle in a city with thousands of houses and tell Bob which city. This time we realize that a calendar will do and leave the stopwatch home. Bob searches for months without finding the needle and we gain more confidence in our needle security measures.
Now as all good scientists we decide to generalize our findings and produce a theory. We have noticed some important security principles:
- The larger the area that a needle is hidden in the harder it is to find.
- Even if we tell Bob where to look it takes a long time to find the needle, provided we aren’t too specific.
- Given long enough Bob will find the needle.
So here’s the theory: To provide security you should hide your needle in the largest area possible, reveal as little about the location as possible and the best you can do is make it time consuming to find the needle, not impossible.
Lost in Code Space
Let’s apply our theory to passwords and start with passwords consisting of a single digit. There are ten digits so there are ten possible passwords, 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9. Bob tries to break the password and finds that it's as easy as finding a needle in a saucer. We realize this is inadequate so we decide to change to two digit passwords. Now rather than ten possible passwords there are a hundred, 00 through 99. Bob tries to break the new password and finds it a little harder but still not very hard. We are on a roll now and so decide to make all passwords out of three digits. This allows us to pick from a thousand passwords, 000 to 999. Bob takes even longer to break the password and we know we are on to something.
We recognize that each time we add a digit we multiply the number of passwords by a factor of ten. This is what it looks like mathematically:
- 10, ten passwords.
- 10 x 10, one hundred passwords.
- 10 x 10 x 10, one thousand passwords.
A short hand for this is:
- 10^1, ten.
- 10^2, one hundred.
- 10^3, one thousand.
Next we notice that things work out even better if we use letters. Let’s confine the discussion to the lower case letters, of which there are 26. From this we can make 26 one letter passwords, if we use two letters we can make 26 x 26 or 676 passwords and with three letters 17576 passwords. So what we have looks like this:
- 26^1 = 26 passwords.
- 26^2 = 676 passwords.
- 26^3 = 17576 passwords.
This demonstrates the important concept that we can express the number of passwords mathematically even when the symbols we use represent letters all that matters is the number of symbols. We can express this
concisely with the expression X^n where X is 10 for the digits only example and 26 for the lower case letters, and n is the number of symbols in the password.
So let’s try using both digits and lower case letters together. This gives us 36 symbols and we have:
- 36^1 = 36
- 36^2 = 1296
- 36^3 = 46656
This works so well why not include the upper case letters as well and increase the number of symbols to 62. This gives:
- 62^1 = 62
- 62^2 = 3844
- 62^3 = 238328
Let’s go all the way this time and use every symbol available to us including numerals, alphabetical, punctuation and a few odd symbols. As a matter of practicality this is limited to the symbols on a standard keyboard minus some that have special interpretation and can't be used because they tend to confuse the computer: ()[]{}-_=+:;”’<,>.?/\|. That gives us 74 symbols to work with. Let’s reduce that to 70 to prevent someone from flaming me for counting improperly.
Continuing the example this gives us:
- 70^1 = 70
- 70^2 = 4900
- 70^3 = 343000
To make a strong password you might be asked to use ten symbols and use at least one numeral, lower case alpha, upper case alpha and symbols such as @ and %. The number of passwords you can make in this case is 70^10 or approximately 2.83x10^18. That’s 2,830,000,000,000,000,000 let’s hope the national debt doesn’t get that high. If you were to make 100 guesses a second it would take 895,723,138 years to enter all possible values. One term for the number of combinations that can be made from a set of symbols is a Code Space. In general if you are guessing you will exhaust half the code space before you find the password so that reduces the number of years of guessing to 447,861,569.
More Rules Make Less Security
People tend to select passwords that they can remember. Things like their last name spelled backwards, their birthday, names of children, etc. Since such passwords are easy to guess they are usually discouraged and some systems enforce a set of rules like the following:
- Must have at least one each of upper case, lower case, numeral and special characters.
- Must contain at least 10 characters.
- Must not repeat the same symbol more than twice in a row.
- Must not contain dictionary words spelled forward or backwards.
- Must not be a previous password spelled backwards or with with variation in case.
- Must not contain the username.
This makes the password more secure, right? Well yes and no. It does indeed guard against passwords that are easy to guess but at the expense of shrinking the code space. In analogy there are fewer stalks of hay in the stack and fewer houses in the city. But you ask isn’t a smaller code space less secure? Ah there’s the rub, but how much less secure?
To clarify this let’s go back to the needle hidden in the city. Suppose that rather than allowing the needle to be hidden anywhere in the city we make some rules like these:
- Must be hidden in a residential structure.
- Must not be hidden on the second floor.
- Must not be hidden in a red or blue house.
- Must not be hidden in a raised ranch.
- Must not be hidden in the same house as the previous needle.
It’s clear from this that each rule makes the needle easier to find because we have told Bob where he doesn't need to look. He can disregard any locations that fit the rules and know that he hasn’t passed-by the needle.
The Mark of the Beast
Let’s take a closer look at an innocent looking rule, the one about consecutive symbols. This allows us to use the sequence 66 as in Route 66 but not 666. Taken together with the rule that passwords must be 10 characters long this means that the password 666abB%#7D is not allowed. That’s a small price to pay but what about 666%^#Gf15 or 77h666hM%#& or MMX?&ak666? All of these are illegal as well but still that’s not too many. Just for fun though, let’s see how many combinations of three symbols there are. Hmm... looks like just about 70.; sequences like 111, MMM, %%%, … etc.
In general we see that if any sequence of three identical symbols occurs that it prevents us from using the balance of the code space because that’s the ultimate effect when you throw out all ten characters based on the value of three.
Bob is enthusiastic about this because he sees that it makes his job easier since our rule has already told him where he won’t find the needle. We kind of laugh under our breath at Bob because we know that it removes only 70 passwords from the list and what remains is enough to keep him busy for a long, long time.
To help us sleep that night we start counting illegal passwords, 111 plus anything, 222 plus anything, 333 plus anything, … etc. The sandman comes along as we approach the end with ### plus anything and all is well. Suddenly we wake up terrified because we realize that 2111 plus anything and 22111 plus anything and 223111 plus anything are also illegal and in general that anything plus three in a row plus anything is also illegal. Immediately we get up and pull the plug on the computer to keep Bob from purloining the pictures of last summer’s family picnic.
Would You Like that Supersized?
The problem we have discovered is that passwords are positionally sensitive and that abc is different from cab, is different from bca, even though they contain the same symbols. To gauge the effect of this on the strength of a password let’s look at things a little differently. Up to now we have applied the rule that a password must be 10 characters long, but notice that when three identical characters appear in a row they might as well be a single symbol based on their effect on the number of passwords. For example if we have the sequence car, car, car, bus, truck and we want to know how many combinations of different types of vehicles there are that car needs to be specified only once. This leaves us with car, bus, truck. Similarly if we take the point of view that each of the 70 identical sequences of three characters is a single type of symbol then the password might as well be 8 characters long rather than not 10.
- aaa = class of type a.
- %%% = class of type %.
- %%%0123456 = type % plus 0123456 = %0123456.
Now we can calculate the effect of our rule on the code space. A quick search of Wikipedia for the term permutation locates a formula that allows us to calculate the value of seventy symbols taken eight at a time. This formula is also available on our TI-89 but we never read the manual and so don’t know how to use it. Despite this obstacle we calculate the value to be approximately 3.80 x 10^14 or 380,000,000,000,000 illegal passwords. Subtracting this from the full code space shows that one out of every 74 possible passwords is illegal. This equates to a loss of 1.35% of the code space. Bob still has to do a lot of work but we have only applied two of the rules so far.
Next let’s apply the rule that you need at least one numeral, one lower case letter, one upper case letter and one special symbol. This eliminates passwords such as Gab#%^HUmf and 18%a^$va42.
- Passwords containing no numerals = (70-10)^10 = 6.05 x 10^17.
- Passwords containing no lower case letters = (70-26)^10 = 2.72 x 10^16.
- Passwords containing no upper case letters = (70-26)^10 = 2.72 x 10^16.
- Passwords containing no special symbols = (70-(10+26+26))^10 = 1.07 x 10^9
- Total passwords made illegal by this rule = 6.60 x 10^17.
Ignoring the fact that some of the passwords have been counted twice because they were already eliminated by the first two rules this makes one in every 4.30 passwords illegal, a loss of 23.3% of the code space. Ouch!
Next let’s apply the rule about dictionary words. From a dubious source on the Internet I learned that the average length of an English word is 5.1 letters and there are about 500,000 words. The rule says that words are illegal both backward and forward so this increases the number to 1,000,000. By applying the reasoning used in the first case we are able to view each word as a type of symbol and collapse it from five characters to one. This leaves us with the equivalent of a six character password except that there are 1,000,000 symbols representing dictionary words instead of 70. This is the same as multiplying the number of passwords in a five character sequence by 1,000,000 and calculates to 10^6 times 70^5 = 1.68 x 10^16.
The cumulative effect of all of these rules leaves us with 6.77 x 10^17 illegal passwords and that amounts to one out of every 4.18 or 24% of the code space. This is not quite correct because it ignores that many passwords are made illegal by more than one rule thus causing them to be counted more than once. Notice though, that the third rule accounts for most of the damage and taken by itself there is no multiple counting.
No Inconvenience is too Great for our Customers to Bear
How does this affect the average computer user? Well it means that he is likely to have many of his attempts to change a password thwarted. The dictionary rule is particularly nasty in this respect since few of us know the dictionary forward and backward. It is also true that the user is not likely to end up with an easy to remember password. The more rules imposed on passwords the harder it becomes to compose a legal one and each rule makes the password weaker, sometimes significantly weaker.
Stay Tuned
In part two I will discuss a method for creating passwords that doesn’t require the user to be aware of the rules and a way the user can follow the rules but compose a weak password nonetheless.