少点错误 05月21日 14:02
Humans are Insecure Password Generators
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了传统密码破解方法的局限性,以及利用人工智能,特别是神经网络,破解密码的新趋势。人类选择密码的习惯具有可预测性,这使得AI能够通过学习大量密码样本,构建概率分布模型,从而更有效地破解密码。虽然完全破解未知密码仍有难度,但AI在个性化猜测方面具有巨大潜力,这使得保护个人信息变得更加重要。文章建议使用密码管理器生成随机密码,以应对日益增长的密码破解威胁。

🧠 人类密码选择并非随机,存在可预测性。攻击者可利用此特性,优先尝试常见密码组合,提高破解效率。

🤖 神经网络可学习大量密码样本,构建概率分布模型,从而更有效地猜测密码。PassGan等早期研究已验证此方法的可行性。

🛡️ 个性化信息泄露会增加密码破解风险。AI可利用个人信息构建更精确的概率分布,从而更容易猜中密码。

🔑 使用密码管理器生成随机密码是更安全的解决方案。避免依赖个人记忆,降低密码被破解的风险。

Published on May 21, 2025 5:58 AM GMT

And They're Currently Being Cracked

(A crosspost of Humans are Insecure Password Generators)

 

Any password longer than 12 characters or so is invulnerable to brute force attacks. So why do so many longer ones keep getting breached?

Because humans don't pick passwords randomly; we tend to pick ones that are easy to think of and easy to remember. This tendency can be exploited; attackers can identify passwords that are more likely than others and try those first.

The best way to think about this is that humans are picking passwords from some underlying distribution; a distribution that is not uniform across all possible strings of characters.[1]

Any time you're trying to find a specific value from a non-uniform distribution, the optimal strategy is to try the most likely values first. If you're trying to guess the first digit of a number, you should guess "1" first, not "9".[2] Passwords are no different; a traditional brute force attack that tries each string in alphabetical order is horribly inefficient, because you're treating the space as uniform when it would not. The ideal attack checks the most likely passwords first.

Today's hackers can already do something similar. One of the best password cracking techniques todays is a dictionary attack: You take known passwords that have been released in previous security breaches and test them against other peoples' accounts. This is, effectively, taking thousands or millions of samples from the underlying "humans picking passwords" distribution. But this is still only a tiny fraction of the passwords that humans do use; vast numbers will be inaccessible with this technique.

Clever attackers can do a bit better than this by identifying common password-mangling patterns like "replace S with $", and then write a program to try these variations. This allows them to generalize out from the specific examples they've seen, and estimate other unknown areas of the distribution. But this is still highly imperfect and will miss many things.


 

Enter neural networks. A neural network takes in millions or billions of examples, distills them down into a probability distribution, and then outputs new things from that distribution. This is exactly what is needed to crack passwords. Neural networks allow an attacker to brute-force the effecient way; by going over the entire probability distribution, most to least likely.

This idea isn't new. In 2017, researchers released PassGan, a deep learning system that does exactly this.[3] It had a... less than stellar reception from the security community. Arcs Technia called it "mostly hype", and "mediocre performance dressed up as something to worry about", pointing out that it wasn't significantly better than other existing password crackers.

But this completely misses the point; the makers of PassGan weren't trying to break into people's accounts, quite the opposite; they were security researchers demonstrating a new attack vector.[4] Proofs-of-concept exist to prove that a concept is viable; not to actually put it into practice. Work on the subject has progressed slowly but surely since then; a 2021 paper improved on PassGan's results, and a 2022 attempt did even better, doubling the 2021 paper's accuracy.

The real advance will only arise once a bad actor tries to bring such a method into the age of big data; we've observed robust scaling laws on how LLM performance improves with larger amounts of compute, and there's no reason to expect that this won't apply to password generation just like it did to everything else. PassGan was trained on a single GPU for a few hours with a dataset of 23,679,744 known passwords. Compare this to the tens of thousands of GPUs used over the course of several months to train frontier AI chatbots, and the hundreds of millions of passwords available in other data breaches, and there are clearly large improvements on the table for anyone willing to commit enough resources to the task.

Of course, pure password cracking of this form is of limited utility. It can be applied to any hash that was released in a data breach, increasing the risk that the source password becomes known. In the worst case scenario, an advanced model would effectivly be able to invert most password hashes, defeating the protection they provide. This would be quite bad, (though note that existing crackers can already reverse more than 50% of all hashed passwords), but still requires that the parent organization's security is breached, so most users of most services would still be safe. The jackpot would be guessing accounts with unknown hashes, but attacking "clean" accounts almost always requires going through an online API with rate limits. Even a program drawing from the theoretically optimal probability distribution wouldn't be able to guess the median human's password in less than 10 attempts.[5]


 

The scarier risk arises from personalized guesses. Most of us have quite a lot of information about ourselves online. Our name, hobbies, friends, location, workplace, and much more. Feeding all of these into an AI would allow it to build a distribtion for that particular person rather than all people, which can be many orders of magnitude more accurate. (Consider: existing password cracking algorithms are smart enough to check for birthdays in passwords, but there are around 20,000 different birthdays people can have, and it has to try them all. A personalized algorithm would only need to check 1.) This sort of thing is already being done by password recovery services with the help of the account owner, and the large-scale data gathering done by many companies shows that the subject's consent is not really needed for stuff like this.

The fact is, when a human thinks of a password to use for themselves, this is a fundamentally insecure process. We have no cryptographic guarantees about the human brain like we do about a carefully-designed computer algorithm, and mounting evidence shows that humans are in fact quite predictable. One 2019 study found that by using a neural network, they were able to guess almost half of everyone's passwords in under 1000 attempts just by knowing a single one of that person's other passwords. It's only going to get worse from here.


 

But I think this is actually a good thing.

Traditional password cracking tools could be created by one smart programmer, meaning there was no inherent advantage for the defenders; the bad guys can come up with new cracking methods just as easily as the good guys can warn people about them. But it's pretty hard for a criminal organization to rent 10,000 GPUs in a datacenter without being noticed. And the current largest store of real life password data is probably the white-hat website HaveIBeenPwned, with over 5.5 billion breached passwords in its (private) database.

This presents an opportunity for prosocial organizations to gain the upper hand. They could build private models that warn users with potentially-guessable passwards, before criminals have an opportunity to gain the same capabilities. (And the criminals will gain these capabilities soon enough.) This would allow us to move away from the model of urgently notifying users that a hash of their password has been released on the dark web, forcing password resets, and getting a lot of hacked accounts anyway. Instead, we'd be able to warn users of the weakness of their passwords before they're ever created. (And do this in a reliable, consistent way, not by using easily-defeated kludges like "if it doesn't contain a special character it's unsafe".) In a world where everyone is using secure passwords, it won't even matter if the hashes are stolen![6]

The end state of this arms race is simple: people need to use random passwords. It is fundamentally unsafe to entrust your digital security to the hope that your thought process is sufficienly incrutable that no possible technology could figure it out; a hope that is consistenly being dashed.

As technology inexorably improves, the "let people pick their own passwords" model becomes less and less viable. We need to be encouraging a transition away from that model as soon as we can, and trying to minimize harm in the mean time. The status quo taunts users with the possibility of having an easily-remembered-and-secure password, but the majority of them will fail at this. Better to go all the way and get rid of the idea entirely. Choosing one's own passwords should be regarded as foolish for anyone but the experts; akin to representing yourself in court, or installing a high-voltage electrical appliance.

The safe solution is already known: Get a password manager. Let it pick your passwords for you. Use a fully random passphrase generator for the master password, or if you absolutly can't remember a passphrase, write it down on a piece of paper and keep that paper in a secure place. Any method that involves your brain is simply not going to remain secure for much longer.

  1. ^

    Of course the exact distribution will vary from person to person; someone with a cat is more likely to choose a cat-related password. But without knowing anything about the person, there's also a global distribution across all humans that's equally valid.

  2. ^

    Another neat example of this principle is that if you're searching across an ordered set of objects where you know their order, thus enabling binary search, in many practical cases you do not want to pick the midpoint for each iteration of the search like many computer science 101 courses will tell you. You actually want to first estimate a probability distribution over where your target element is, and then divide the probability space in two with each iteration. This can significantly speed up the search in any case where the target's location is not completely random.

  3. ^

    It actually wasn't the first; this paper from 2016 beat it to the punch. But they didn't give the AI a catchy name, and it didn't get much attention. (See also probabalistic context-free grammars from 2009, a neural-networed-powered password strength checker from 2006, etc.)

  4. ^

    Ok, they were also doing a publicity stunt for their company.

  5. ^

    I can guarantee this is true without needing to know anything about the ceiling of theoretical AI capabilities simply because there are enough passwords in use that the most common 10 of them together don't account for 50% of all accounts.

  6. ^

    As long as they're salted, not using a broken hashing algorithm, etc.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

密码破解 人工智能 神经网络 密码安全
相关文章