@#%~$<%
+%#<>?#~

Patterns in passwords, like dates, names, and keyboard walking, make it easier for a smart brute-force cracking algorithm to guess your credentials. This interactive article explores some interesting patterns in leaked passwords and why you shouldn't use a memorable password.

By Vishal Devireddy
June 9, 2021

&%\$@~%+
< %\+~%*

Passwords are often the only line of defense for our online identity, securing our social media profiles, bank accounts, and computers. Most passwords, just like the identities they protect, are filled with personal meaning. Sometimes they’re playful, like “incorrect” or “letmein,” or sentimental, like the name of a child or a special date. These are called keepsake passwords because of their personal significance.

Unfortunately, the same factors that make keepsake passwords more memorable also make them less secure.

Keepsake passwords are vulnerable to “spidering,” which involves hackers using personal information about the victim to focus their attack.

<"\#?~@

Password strength is often measured by information entropy, a number that represents the number of possibilities the password could be. For example, a password with 30 bits of entropy requires a brute force attacker to try up to 2³⁰ ≈ 1 billion possibilities. Increasing entropy by a single bit doubles the time required to guess.

An uppercase letter has log₂(26) = 4.7 bits of entropy because there are 26 possibilities. A lowercase letter also has 4.7 entropy bits, while a digit has only 3.3 bits.

The password has 8 lowercase letters, so, if it was generated randomly, it would have 37.6 bits of entropy.

&%~<#\#?@\~
>@#@?+?

XKCD 936. Explains why the password 'Tr0ub4dor' has less entropy and is harder to remember than the password 'correcthorsebatterystaple'.

XKCD 936: Password strength

However, the entropy of a password is usually less than the sum of its parts. For example, each of the characters in password has 4.7 bits of entropy, but the total isn’t 8 * 4.7 = 37.6 bits. The password was chosen from a space of “dictionary words,” not random characters, so the number of possibilities is much fewer. Since “password” is often the first attempt by password crackers, its entropy is actually only 1 bit!

The entropy of a password is the sum of the entropies of its characters only if the password was chosen completely randomly, with no semantic meaning or correlation between characters. Unfortunately, truly random passwords like 9Yiuvj%1xG are much harder to remember than your favorite football team @rsenal86.

>><~> <>?\>@=+

Close up photograph of a pink waratah flower.

A waratah flower.

Data breaches have exposed over 600 million real world passwords—and most of them are insecure. By examining patterns in leaked passwords, we can find out how bad passwords are chosen and how passwords are hacked. The dataset we’ll use comes from Have I Been Pwned.

Let’s walk through the thought process of Alice, a new employee trying to create a strong and memorable password for her work account.

Alice decides on waratah, an Australian Aboriginal word referring to one of her favorite plants, and she’s reminded of her recent vacation every time she types it in. Alice’s rationale for choosing it is that it’s such an obscure and hard-to-spell word that no one is likely to guess it.

+#=&$+@#
+=#"$&@

Dictionary attacks are a type of brute force password hacking technique where the attacker tries millions of words from a dictionary. The dictionary doesn’t include only English words; it also contains common passwords from past leaks.

This attack is often succcessful because people choose passwords that are words in their native language or a combination of words.

This t-SNE visualization shows semantically similar passwords clustered together. The labels sizes are proportional to the log of the password’s total occurences.

Filter by minimum occurences:

(≥ 1,000)

Scroll in to see the word labels and click on a label or search for a password:

There are 512 passwords (1 distinct) containing the phrase, including word reversals and “leetspeak” encoding.

waratah (512)

@\"\+~\

Imagine a hacker wants to hack Alice’s bank account, so he decides to check social media for any password clues. On Alice’s public Instagram profile, the hacker finds she follows several landscaping accounts. On Twitter, she’s recently retweeted some fantasy authors.

With this information, a smart hacker can launch a dictionary attack with gardening and fantasy related words, which are more likely to match Alice’s password.

Alice’s assumption that a difficult-to-spell word would be more secure is wrong, because dictionaries attacks don’t consider spelling complexity. All words in the dictionary will be tested. They are just as likely to guess rhododendron or melaleuca as wildflower or daisies.

?"%~><+~
?#~$~$~"

This clustering shows some other interesting patterns.

Passwords starting with “ilove” are much more often followed by a common male name than a female name. This also occurs with passwords starting with “teamo” (te amo is i love in Spanish).

Most people think their password choice is more unique than it actually is. Try exploring these clusters:

A new security policy at works forces Alice to update her password to follow some requirements: at least one uppercase letter, number, and symbol.

Alice wants to keep using her favorite plant, so she changes some letters into numbers to make it more secure. She decides on W@r@7@h1 because the @ signs look like As and the 7 reminds her of a T. She capitalizes the word and appends a digit to meet the eight-character recommendation.

$~*@&=@&=

Leetspeak is an encoding system that replaces some alphabetic characters with numbers or symbols that look similar. This is often used to satisfy a password strength meter, while still keeping it easy to remember.

"<==
\"?%"~<>~$$

Original	Leet
A a	4@
B b	8
C c	[([<
E e	3
G g	69
I i	1!\|

Original	Leet
L l	17\|
O o	0
S s	5$
T t	7+
X x	%
Z z	2

Most substitutions are extremely predictable, like replacing a with @ or e with 3. Password crackers are on to this trick and try these substitutions when testing combinations. Leetspeak substitutions do improve complexity, but likely less than you think.

The table on the right shows the most common 13375p3@k substitutions.

Try entering in a leetspeak encoded password like W@r@7@h1 to see how much capitalization and leetspeak help in increasing complexity:

W@r@7@h1 takes around to guess.

These cracking time estimates assume an “offline” attack attempting 10,000 passwords a second.

Alice signs up for a new social media site for pet owners. The site’s security seems fishy, so she wants to use a unique password just in case it gets hacked. She wants a password she’ll remember every time she visits the site. She knows enough to avoid using common words like the site’s name, so she ends up choosing her dog’s name combined with its birthday.

<%~\&

This visualization shows date-like patterns contained in leaked passwords. Click on a date on the calendar heatmap to see passwords containing that date. The brighter the calendar square, the more often that exact date appears in leaked passwords.

When a password has an ambiguous number pattern, its count is split, so 122001 would contribute 0.5 to both January 2, 2001 and February 1, 2001. Try changing the year filter below.

(2000-2003)

Try clicking a date on the calendar.

Notice that years in the 1980s-2000s are more common in passwords. This likely represents birthdays, because the password list comes from a 2009 leak from RockYou, a site with mostly millenial users.

@"*\
%&=+$%">

Some obvious numeric patterns like 12345 were removed, but some noise remains. For example, 14344 actually represents “i₁ love₄ you₃ very₄ much₄″, not March 14, 1944.

For every year, Valentine’s day and Christmas are overrepresented, but other holidays like Independence Day are not. Repeated patterns like 080808 are also common. The lack of other consistent patterns implies that people choose dates that are personally meaningful, but not significant to others, like birthdays.

Still, including any date in your password decreases its entropy. A quick perusal of your Facebook timeline would reveal to an attacker the birthdays of your family and important anniversaries. Good password crackers usually check all date-like patterns before trying random numeric sequences, so even Alice’s dog’s birthday isn’t very safe.

Alice hears that she should avoid dictionary words and dates when making passwords, so she decides to change her password to something that won’t be found in any dictionary. After staring at her keyboard for a minute, she decides on bGt%6YhNmJu&.

Her company’s password strength meter says it’s perfect; it has a mix of symbols and is 12 characters long. The estimated entropy is over 800 bits! Alice feels pretty confident it won’t ever be hacked.

?&"&+<
$=\>#>\

Password: Guess time:

Passwords with a “keyboard walking” pattern start at an arbitrary key, then move in a direction (usually right or down) while continuing to hit keys. Sometimes this is combined with holding down the SHIFT key, so that some characters are uppercase or symbols to improve complexity.

While the generated password may seem to be random and unhackable, password crackers check for these keyboard patterns and guess them early on.

Many passwords in the leaked passwords dataset have a spatial pattern. Other than the numeric passwords like 123456, common keyboard walking offenders include qwerty and 1qaz@wsx.

Try entering in Alice’s password bGt%6YhNmJu&. The up-down zig-zag pattern is easy for Alice to remember. It also doesn’t appear in any password lists, so it would take a while to crack.

Although, brute force guessing isn’t the only risk...

\~#"~@"\
\#?<%?~

Some hackers watch as their target types their password, while memorizing the keys pressed. This is called shoulder surfing and is more of a problem in corporations, where a hacker can be disguised as an innocent employee.

This is why phone pattern locks are insecure. After seeing a phone unlocked once, even regular people are 64% accurate in replicating the pattern lock.

For the same reason, spatial passwords are also vulnerable to this attack. Regardless of SHIFT-ing, keyboard walking is easy to track and memorize.

%>%& ?~<*\+\

Mask attacks prune possible passwords by adding known constraints. For example, when an attacker knows a password is eight characters long and ends with “123”, its entropy is reduced to under 30 bits, so brute force guessing can be accomplished faster. If the attacker sees someone keyboard walking, this also reduces entropy, since most spatial patterns are predictable.

\>*@<"&%
#@%\#+

In 2016, the Pew Research Center surveyed Americans on their cybersecurity habits, including how they create and manage passwords. The graphs on the right show the proportion of survey respondents’ answers to the researchers’ questions (paraphrased).

The green bars on the left side represent good password hygiene practices according to cybersecurity experts. The red bars on the right are unadvised practices.

Note: in order for the results to be representative of the American population, some survey respondents’ answers have been weighted more than others. See the survey methodology for more information.

Note: the yes and no responses in the graphs on the right don’t add up to 100%. This is because some respondents choose to not answer the question. Those refusals are represented by the horizontal blank space between the bars.

Try using these filters to see how different subgroups of Americans manage their personal cybersecurity. The most variance occurs with age and education.

Gender:

Age:

Race:

Income:

Education:

This chart shows how respondents manage their passwords:

Writing down passwords or storing them on a document are highly discouraged by cybersecurity experts. Saving them in an internet browser is sometimes okay, but a password manager is best.

>?"?+%+~
#$%?#"~+$%$\

The tradeoff between memorizability and security is difficult to balance. Also, a password that is difficult to remember encourages writing it down, which is a bad practice. But how else can you keep track of passwords?

~<%

What if you don’t use only a password?

Two factor authentication (2FA) is an additional security method that requires users to enter in another token to verify their identity. For example, it could require you to enter in a code you receive as a text message. Websites are increasingly supporting multifactor authentication, but it’s usually disabled by default.

A Microsoft study found that multifactor authentication blocks 99.9% of automated attacks and Google has shown its 2-step verification to be similarly effective. A hacker would have to get access to both your password and your phone.

?+&$\=
+&@=@#*>

What if you don’t have to remember your passwords?

The idea behind password managers is that the only secure password is the one you can’t remember.

Password managers are applications that store your passwords for all of your websites and accounts. They also generate extremely strong and unique passwords, so you don’t have to rely on trying to create or remember good passwords. You only need to remember a single master password to access the manager—so make sure it’s strong.

Some popular password managers are Last Pass or KeePass (both free), or 1Password.

Password managers also help reduce password reuse. Most users reuse their password completely or with only small modifications. This is dangerous because if one of your services gets hacked, your password could get added to a public leaked password dataset. Then every password cracker will know about it and your other accounts are now at risk.

Try entering in a password to see if it’s ever been leaked (according to Have I Been Pwned):
password has never been seen in leaked password datasets!

What if you don’t even have a password?

Single sign on (SSO) services let you log in to a website using another one of your accounts, like Google, Facebook, or Apple, instead of entering in your email and a new password. This is convenient and saves you from remembering another password. Just make sure the password to your main account is secure, or all your connected accounts will be compromised as well.

However, there is still the small risk that Google, Facebook, or Apple get hacked, so security experts still prefer password managers. Also, you may want to avoid SSO because of the privacy implications; connected accounts lets Big Tech track you across the web more easily.

Good lucking keeping your passwords secure!

$\"$~"&

Data:

robinske/password-data for aggregation of HaveIBeenPwned dataset
dwyl/english-words for list of English words
explosion/spaCy for vector word embeddings
Visualizing semantics in passwords: the role of dates for parsed passwords containing dates
Pew Research Center survey on cybersecurity

Libraries:

dropbox/zxcvbn for password strength estimation library
scikit-learn/scikit-learn for t-SNE code
react-search-autocomplete for search box used in t-SNE section

This article was created with Idyll and D3. The source code is available on GitHub.

@#%~$*<%+%#<>?*#~

&%\$*@~%+< %\+*~%*

<"\#?~@

&%~<#\#?@\~>@#@?+?

>*><~> <>?\>@=*+

+#=&*$*+@#+=#"$&@

@*\*"\+~\

?"%~><+~?#~$~$~"

$~*@&=@&=

"*<=*=\"?%"~<>*~$*$

<%~\&

@"*\%&=+$%">

?*&"&+<*$=\>#>\

\~#"~@"\\#?<%?~