July 2nd, 2024

How random are TOTP codes?

The blog post examines TOTP code randomness using HMAC with SHA-1. It analyzes digit frequency in generated codes, showing diminishing bias over generations. Readers discuss and suggest additional analysis methods.

Read original articleLink Icon
How random are TOTP codes?

The blog post discusses the randomness of Time-Based One-Time Password (TOTP) codes generated by the TOTP algorithm using HMAC with SHA-1. The author explores if there is a bias towards certain digits in TOTP codes by sampling and analyzing the frequency of digits in generated codes. The post includes code snippets for generating and analyzing TOTP codes, showing how the distribution of digits changes with the number of generations. The author concludes that over a large number of generations, any initial bias towards a specific digit diminishes, indicating the randomness of TOTP codes. The post also mentions the possibility of time-travelers using TOTP codes and provides links to related articles on TOTP code generation and analysis. Additionally, there are comments from readers discussing the randomness of TOTP codes and suggesting further analysis methods.

Related

The Magic of Participatory Randomness

The Magic of Participatory Randomness

Randomness is vital in cryptography, gaming, and civic processes. Techniques like "Finger Dice" enable fair outcomes through participatory randomness, ensuring transparency and trust in provably fair games.

ID verification service for TikTok, Uber, X exposed driver licenses

ID verification service for TikTok, Uber, X exposed driver licenses

A cybersecurity researcher found AU10TIX's admin credentials exposed online, risking data breach for TikTok, Uber users. Concerns rise over ID verification services' vulnerability to cyberattacks, emphasizing the need for enhanced security measures.

Exploring Randomness in JavaScript

Exploring Randomness in JavaScript

This article compares Math.random() and Crypto.getRandomValues() in JavaScript for generating random values. Despite Crypto being more secure, Math.random() suffices for creating color palettes due to speed and perceived randomness.

How MFA is falling short

How MFA is falling short

Multi-factor authentication (MFA) faces challenges from cyber attackers exploiting weaknesses. Breaches despite VPN, SSO, and Google Authenticator usage show risks like phishing, vishing, and Man-In-The-Middle attacks. Recent developments include "Tycoon 2FA" targeting Microsoft 365 and Gmail accounts, emphasizing the need for stronger authentication methods.

Eight versions of UUID and when to use them

Eight versions of UUID and when to use them

The article covers eight versions of UUIDs, detailing their characteristics and best use cases. Recommendations include v4 for random IDs, v7 for sortable IDs, and v5 or v8 for custom data. Some versions have been replaced. The author shares insights and hints at a secretive project.

Link Icon 22 comments
By @nayuki - 5 months
There is a small inherent bias in TOTP (and HOTP) codes. The algorithm extracts 31 consecutive bits from a SHA-1 hash ( https://en.wikipedia.org/wiki/HMAC-based_one-time_password#A... ). Let's assume that SHA-1 produces uniformly random bits.

The 31-bit number is modulo'd by 10^6 to generate a 6-digit base-10 number. But 2^31 isn't a multiple of 10^6, so some remainders will be slightly more likely than others. Namely:

• 000000 to 483647: 2148/2147483648 ≈ 1.000240e-6 chance.

• 483648 to 999999: 2147/2147483648 ≈ 0.999774e-6 chance.

This kind of bias always happens when changing the range of random numbers and the number of possible outcomes is not a divisor of the number of incomes, and rejection sampling isn't used.

This is why, for example, java.util.Random.nextInt(int n) (which generates an integer in the range [0, n)) carefully uses rejection sampling in its algorithm: https://docs.oracle.com/javase/8/docs/api/java/util/Random.h...

By @SoftTalker - 5 months
We like to see patterns. I think the same thing with TOTP codes, I'm always noticing when they have repeated digits, or only 2 different digits, stuff like that, but that's just the nature of the human brain looking at random numbers.
By @landgenoot - 5 months
I have been thinking about a useless TOTP app that works the other way around. Instead of giving you the current code, it gives you the timestamps when the code is e.g. 000000, 123456 or 777777.

With a window of 30 seconds and 1e6 possibilities, the expected time it takes to get to a particular number is 347 days. Should be easy to brute force.

By @dx034 - 5 months
My employer uses alphanumeric 2 factor codes and I'm so certain that they have a bias towards some letters (mostly y and z). I know I'm probably wrong and it's probably because they appear so rarely in real words, but I can't shake the feeling they aren't random.

Only problem is that I don't have the algorithm. I started writing down all codes I got but since I only get 5 a week, it's a long process. I'll probably switch jobs before I have valid results.

Not that it would change anything, but I'd be really curious how biases in those codes could appear.

By @selcuka - 5 months
> How random are TOTP codes?

Nitpicking: They are not supposed to be random as that would defeat the purpose. We should be able to deterministically generate the same number on both the client and server side from the same 2 seeds (secret key and the timestamp).

They should be ideally uniformly distributed, though.

By @_flux - 5 months
I wondered about a related problem: how many of the codes are "easy"? Easy as in they are composed of "simple" patterns, such as going to adjacent digits in a number pad. Often it seems that if you are given such a sequence, there's a pattern you can use to recall it. Seems like it would make the randomness suspect, perhaps? But maybe all the sequences have easy rules, the rules just differ?

So during a short hackfest I created this to check it out: https://github.com/eras/reco . Sorry, no binaries and the font size is hardcoded for presentation, and actually the whole UI stuff is just for that reason there.. By default it scans the whole 6-digit sequence space, but you can also give it a sequence and it will show the rules it finds.

Given the rules it uses, it turns out 50% of 6-digit sequences are "easy". Because it is based on the rules I just thought would apply there are probably other "easy" rules that could cover a lot of the remaining 50%.. It also cheats in a way by trying to apply the codes to all* shapes and sizes of numpads (1x10, 2x5, 3x3+1): match in any numpad is accepted for a sequence to be "easy".

It may also be some of the rules for the sequences it finds are not "easy" after all :).

By @petterroea - 4 months
The title scared me. I was ready for a TOTP RNG generator attack, still remembering Samy's PHP RNG talk from the 2000s
By @usr1106 - 5 months
I often have the feeling that my TOTP codes are somehow simple. Simple in the sense of containg repeated digits, some rhythm (e.g. 663183) or symmetry instead of being "purely random" (e.g. 581329).

I guess the reason is the human brain can really recognize many kinds of patterns. Nothing weird about the entropy.

By @dfox - 5 months
Since the start of using any kind of SMS 2FA I keep noticing, that for some systems you get “nice to remember” numbers somewhat often. I didn't care enough to actually write them down and confirm whether it is a fact or just my internal bias of finding patterns where there are not any.

On the other hand, various decimal “random” numbers around payment cards (default PIN, authorization codes and what not) are clearly biased, because they are usually generated by taking hexadecimal representation modulo 10.

By @nmstoker - 5 months
Am reassured others see this sort of thing (even if it ends up being chance).

About six months ago our MFA system, which uses codes between 1 and 100, persistently started to give me codes that were odd numbers in the top half of the range (ie 51 or above). This went on for well over a month (several codes per day) before I saw the pattern cease. The rational side of me felt it was just chance but I had a nagging unease all the same!

By @kelseyfrog - 5 months
Can't they run a chi-squared test(n=10) and see that the results is not significant?
By @gizmo - 5 months
A related issue is that TOPT security guidelines suggest using a 160 bit key. Some organizations use 20 chars with alphabet A-Za-z0-9. Easy mistake to make, a byte is 8 bit after all. However, 62 ^ 20 is only 120 bits (give or take). Way less than the recommended minimum. Does anybody know how insecure this is in practice?
By @saagarjha - 5 months
Having implemented TOTP codes once I know that they're basically unbiased because of the cryptography involved. That said, I would bet money that Apple's two-factor implementation is something custom because it just seems far too likely to generate combinations that look non-random. A bet not because I have evidence I'm right, but because I want someone to explain to me how these work, if only just "oh it's literally the same algorithm everyone else uses" :)
By @MattJ100 - 5 months
A related observation, where in many real-world data digit frequency can be predicted, is described by Benford's Law: https://en.wikipedia.org/wiki/Benfords_law

However this law obviously does not apply to TOTP codes (unless someone did something very wrong).

By @hi-v-rocknroll - 5 months
TOTP is slowly on its way out compared to passkeys and FIDO2. It's still useful as another 2FA choice.
By @est - 5 months
offtopic, is it secure to design a login that only requires username+TOTP?

This elimilates passwords altogether, but are there any pitfalls?

By @rustcleaner - 5 months
I wish they were used more, and had adjustable settings.

Can we please have customizable diceware TOTP? I'd like 8-12 words 60-90 seconds. I also wish this could be used everywhere.

By @slau - 5 months
Depending on the algorithm, one or two of the digits in the TOTP are counters to help the server figure out clock drift on the client device.

This was especially relevant when talking about hardware tokens that had relatively inaccurate clocks. In the RSA algorithm I seem to recall it was the second or third digit. Each clock tick was 2.5 seconds or something, so providing the last digit of the clock counter massively reduced the number of calculations the server had to do in case of a mismatch.