Passwords are often the only line of defense for our online identity, securing our social media profiles, bank accounts, and computers. Most passwords, just like the identities they protect, are filled with personal meaning. Sometimes they’re playful, like “incorrect” or “letmein,” or sentimental, like the name of a child or a special date. These are called keepsake passwords because of their personal significance.
Unfortunately, the same factors that make keepsake passwords more memorable also make them less secure.
Keepsake passwords are vulnerable to “spidering,” which involves hackers using personal information about the victim to focus their attack.
Password strength is often measured by information entropy, a number that represents the number of possibilities the password could be. For example, a password with 30 bits of entropy requires a brute force attacker to try up to 230 ≈ 1 billion possibilities. Increasing entropy by a single bit doubles the time required to guess.
An uppercase letter has log2(26) = 4.7 bits of entropy because there are 26 possibilities. A lowercase letter also has 4.7 entropy bits, while a digit has only 3.3 bits.
The password has 8 lowercase letters, so, if it was generated randomly, it would have 37.6 bits of entropy.
However, the entropy of a password is usually less than the sum of its parts. For example, each of the characters in password has 4.7 bits of entropy, but the total isn’t 8 * 4.7 = 37.6 bits. The password was chosen from a space of “dictionary words,” not random characters, so the number of possibilities is much fewer. Since “password” is often the first attempt by password crackers, its entropy is actually only 1 bit!
The entropy of a password is the sum of the entropies of its characters only if the password was chosen completely randomly, with no semantic meaning or correlation between characters. Unfortunately, truly random passwords like 9Yiuvj%1xG are much harder to remember than your favorite football team @rsenal86.
Data breaches have exposed over 600 million real world passwords—and most of them are insecure. By examining patterns in leaked passwords, we can find out how bad passwords are chosen and how passwords are hacked. The dataset we’ll use comes from Have I Been Pwned.
Let’s walk through the thought process of Alice, a new employee trying to create a strong and memorable password for her work account.
Alice decides on waratah, an Australian Aboriginal word referring to one of her favorite plants, and she’s reminded of her recent vacation every time she types it in. Alice’s rationale for choosing it is that it’s such an obscure and hard-to-spell word that no one is likely to guess it.
Dictionary attacks are a type of brute force password hacking technique where the attacker tries millions of words from a dictionary. The dictionary doesn’t include only English words; it also contains common passwords from past leaks.
This attack is often succcessful because people choose passwords that are words in their native language or a combination of words.
This t-SNE visualization shows semantically similar passwords clustered together. The labels sizes are proportional to the log of the password’s total occurences.
Filter by minimum occurences:
(≥ 1,000)
Scroll in to see the word labels and click on a label or search for a password:
There are 512 passwords (1 distinct) containing the phrase, including word reversals and “leetspeak” encoding.
Imagine a hacker wants to hack Alice’s bank account, so he decides to check social media for any password clues. On Alice’s public Instagram profile, the hacker finds she follows several landscaping accounts. On Twitter, she’s recently retweeted some fantasy authors.
With this information, a smart hacker can launch a dictionary attack with gardening and fantasy related words, which are more likely to match Alice’s password.
Alice’s assumption that a difficult-to-spell word would be more secure is wrong, because dictionaries attacks don’t consider spelling complexity. All words in the dictionary will be tested. They are just as likely to guess rhododendron or melaleuca as wildflower or daisies.
This clustering shows some other interesting patterns.
Passwords starting with “ilove” are much more often followed by a common male name than a female name. This also occurs with passwords starting with “teamo” (te amo is i love in Spanish).
Most people think their password choice is more unique than it actually is. Try exploring these clusters:
A new security policy at works forces Alice to update her password to follow some requirements: at least one uppercase letter, number, and symbol.
Alice wants to keep using her favorite plant, so she changes some letters into numbers to make it more secure. She decides on W@r@7@h1 because the @ signs look like As and the 7 reminds her of a T. She capitalizes the word and appends a digit to meet the eight-character recommendation.
Leetspeak is an encoding system that replaces some alphabetic characters with numbers or symbols that look similar. This is often used to satisfy a password strength meter, while still keeping it easy to remember.
Original | Leet |
---|---|
A a | 4@ |
B b | 8 |
C c | [([< |
E e | 3 |
G g | 69 |
I i | 1!| |
Original | Leet |
---|---|
L l | 17| |
O o | 0 |
S s | 5$ |
T t | 7+ |
X x | % |
Z z | 2 |
Most substitutions are extremely predictable, like replacing a
with @
or e
with 3
. Password crackers are on to this trick and try these substitutions when testing combinations. Leetspeak substitutions do improve complexity, but likely less than you think.
The table on the right shows the most common 13375p3@k substitutions.
Try entering in a leetspeak encoded password like W@r@7@h1 to see how much capitalization and leetspeak help in increasing complexity:
W@r@7@h1 takes around to guess.
These cracking time estimates assume an “offline” attack attempting 10,000 passwords a second.
Alice signs up for a new social media site for pet owners. The site’s security seems fishy, so she wants to use a unique password just in case it gets hacked. She wants a password she’ll remember every time she visits the site. She knows enough to avoid using common words like the site’s name, so she ends up choosing her dog’s name combined with its birthday.
This visualization shows date-like patterns contained in leaked passwords. Click on a date on the calendar heatmap to see passwords containing that date. The brighter the calendar square, the more often that exact date appears in leaked passwords.
When a password has an ambiguous number pattern, its count is split, so 122001 would contribute 0.5 to both January 2, 2001 and February 1, 2001. Try changing the year filter below.
(2000-2003)
Notice that years in the 1980s-2000s are more common in passwords. This likely represents birthdays, because the password list comes from a 2009 leak from RockYou, a site with mostly millenial users.
Some obvious numeric patterns like 12345 were removed, but some noise remains. For example, 14344 actually represents “i1 love4 you3 very4 much4″, not March 14, 1944.
For every year, Valentine’s day and Christmas are overrepresented, but other holidays like Independence Day are not. Repeated patterns like 080808 are also common. The lack of other consistent patterns implies that people choose dates that are personally meaningful, but not significant to others, like birthdays.
Still, including any date in your password decreases its entropy. A quick perusal of your Facebook timeline would reveal to an attacker the birthdays of your family and important anniversaries. Good password crackers usually check all date-like patterns before trying random numeric sequences, so even Alice’s dog’s birthday isn’t very safe.
Alice hears that she should avoid dictionary words and dates when making passwords, so she decides to change her password to something that won’t be found in any dictionary. After staring at her keyboard for a minute, she decides on bGt%6YhNmJu&.
Her company’s password strength meter says it’s perfect; it has a mix of symbols and is 12 characters long. The estimated entropy is over 800 bits! Alice feels pretty confident it won’t ever be hacked.
Passwords with a “keyboard walking” pattern start at an arbitrary key, then move in a direction (usually right or down) while continuing to hit keys. Sometimes this is combined with holding down the SHIFT key, so that some characters are uppercase or symbols to improve complexity.
While the generated password may seem to be random and unhackable, password crackers check for these keyboard patterns and guess them early on.
Many passwords in the leaked passwords dataset have a spatial pattern. Other than the numeric passwords like 123456, common keyboard walking offenders include qwerty and 1qaz@wsx.
Try entering in Alice’s password bGt%6YhNmJu&. The up-down zig-zag pattern is easy for Alice to remember. It also doesn’t appear in any password lists, so it would take a while to crack.
Although, brute force guessing isn’t the only risk...
Some hackers watch as their target types their password, while memorizing the keys pressed. This is called shoulder surfing and is more of a problem in corporations, where a hacker can be disguised as an innocent employee.
This is why phone pattern locks are insecure. After seeing a phone unlocked once, even regular people are 64% accurate in replicating the pattern lock.
For the same reason, spatial passwords are also vulnerable to this attack. Regardless of SHIFT-ing, keyboard walking is easy to track and memorize.
Mask attacks prune possible passwords by adding known constraints. For example, when an attacker knows a password is eight characters long and ends with “123”, its entropy is reduced to under 30 bits, so brute force guessing can be accomplished faster. If the attacker sees someone keyboard walking, this also reduces entropy, since most spatial patterns are predictable.
In 2016, the Pew Research Center surveyed Americans on their cybersecurity habits, including how they create and manage passwords. The graphs on the right show the proportion of survey respondents’ answers to the researchers’ questions (paraphrased).
The green bars on the left side represent good password hygiene practices according to cybersecurity experts. The red bars on the right are unadvised practices.
Try using these filters to see how different subgroups of Americans manage their personal cybersecurity. The most variance occurs with age and education.
Gender:
Age:
Race:
Income:
Education:
This chart shows how respondents manage their passwords:
Writing down passwords or storing them on a document are highly discouraged by cybersecurity experts. Saving them in an internet browser is sometimes okay, but a password manager is best.
The tradeoff between memorizability and security is difficult to balance. Also, a password that is difficult to remember encourages writing it down, which is a bad practice. But how else can you keep track of passwords?
What if you don’t use only a password?
Two factor authentication (2FA) is an additional security method that requires users to enter in another token to verify their identity. For example, it could require you to enter in a code you receive as a text message. Websites are increasingly supporting multifactor authentication, but it’s usually disabled by default.
A Microsoft study found that multifactor authentication blocks 99.9% of automated attacks and Google has shown its 2-step verification to be similarly effective. A hacker would have to get access to both your password and your phone.
What if you don’t have to remember your passwords?
The idea behind password managers is that the only secure password is the one you can’t remember.
Password managers are applications that store your passwords for all of your websites and accounts. They also generate extremely strong and unique passwords, so you don’t have to rely on trying to create or remember good passwords. You only need to remember a single master password to access the manager—so make sure it’s strong.
Some popular password managers are Last Pass or KeePass (both free), or 1Password.
Password managers also help reduce password reuse. Most users reuse their password completely or with only small modifications. This is dangerous because if one of your services gets hacked, your password could get added to a public leaked password dataset. Then every password cracker will know about it and your other accounts are now at risk.
Try entering in a password to see if it’s ever been leaked (according to Have I Been Pwned):
password has never been seen in leaked password datasets!
What if you don’t even have a password?
Single sign on (SSO) services let you log in to a website using another one of your accounts, like Google, Facebook, or Apple, instead of entering in your email and a new password. This is convenient and saves you from remembering another password. Just make sure the password to your main account is secure, or all your connected accounts will be compromised as well.
However, there is still the small risk that Google, Facebook, or Apple get hacked, so security experts still prefer password managers. Also, you may want to avoid SSO because of the privacy implications; connected accounts lets Big Tech track you across the web more easily.
Good lucking keeping your passwords secure!
This article was created with Idyll and D3. The source code is available on GitHub.