少点错误 2024年08月14日
An anti-inductive sequence
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了“反归纳”序列的概念,即无法被压缩或预测的比特序列。作者解释了为什么随机序列无法压缩,以及如何通过引入额外信息来解决压缩问题。文章还探讨了如何生成最随机的序列,并用一个预测算法来生成一个反归纳序列,该算法通过预测下一个比特的可能性,输出相反的比特。

🤔 **反归纳序列的定义和本质** 一个“反归纳”序列指的是无法被压缩或预测的比特序列。文章从压缩和预测的概念出发,解释了随机序列无法压缩的原因,因为任何能够预测序列的算法,都可以用来压缩序列。文章还提到了一个悖论:虽然随机序列通常无法压缩,但偶尔会生成可以压缩的序列,例如一个全为0的序列。为了解决这个悖论,作者提出了一个解决方案:如果我们选择压缩部分序列,需要额外添加一个比特来指示该序列是否被压缩。这表明,虽然偶尔可以压缩某些序列,但大多数序列无法压缩,因此添加额外比特的成本超过了压缩节省的空间。

🤯 **生成最随机序列的挑战和方法** 文章探讨了如何生成最随机的序列,并提出了一个预测算法来生成一个反归纳序列。该算法通过预测下一个比特的可能性,输出相反的比特。作者解释了算法的工作原理,并给出了前几个比特的例子。由于算法的目标是挑战预测,因此推理过程会随着序列的增长而变得越来越复杂。

🤖 **Solomonoff先验和不可计算性** 文章提到了一个标准的预测算法解决方案:Solomonoff先验。该算法通过考虑所有可能生成比特序列的算法,并为每个算法分配一个初始概率,然后根据观察到的比特数据删除被证伪的算法,最后计算预测下一个比特为0或1的算法的总概率。然而,这个解决方案存在两个问题:一是它不可计算,二是它依赖于用于编写算法的语言细节。尽管如此,从数学的角度来看,一旦指定了算法的语言,Solomonoff先验就是一个对最不可预测序列的正式定义。

Published on August 14, 2024 12:28 PM GMT

I was thinking about what would it mean for a sequence of bits to be "anti-inductive". It probably is a concept that is already known (as a rule of thumb, if I can think about it, someone probably already wrote a paper on it 50 years ago), but I haven't heard about it.

Some sequences are predictable and can be compressed. These two concepts are deeply related, because if you can successfully predict the next part of the sequence, you don't need to actually write it down; hence compression. A completely random sequence of bits cannot be compressed or predicted.

There is a simple mathematical proof that some sequences cannot be compressed, although it doesn't say which ones. For any natural number N, there are more sequences of size exactly N, than sequences of size smaller than N. Therefore no program can generate a unique sequence shorter than N for any input sequence of size N.

Things get more complicated if we consider the caveat that although random sequences in general cannot be compressed, true randomness means that sometimes we accidentally get a sequence that can be compressed -- for example, with probability 1/2ᴺ we get a sequence of N zeroes, and it would sound silly to argue that we can't compress that!

The solution to this paradox is that if we decide to compress only some selected sequences, then we need to add an extra bit of information specifying whether this sequence was compressed or not. Otherwise, if we see a sequence of bits saying (in binary) "a sequence of thousand zeroes", we wouldn't know whether the intended value is this very sequence of bits taken literally, or the sequence of thousand zeroes. One bit doesn't seem like much, but actually most sequences cannot be compressed, so the cost of adding one extra bit to each of them outweighs the occasional space we would save by compressing the ones we can.

But still, if I needed a random sequence of bits to use e.g. as a password for something important... and by a miracle I generated a sequence of N zeroes... I would probably feel quite uncomfortable to use it, and would simply generate a new one. Is this a good security practice, or not? Because from certain perspective, by removing the "insufficiently random" sequences from the pool of possible choices, I am reducing the size of the pool, which... kinda makes it easier to guess the password?

Something similar actually happened to me once. I used a mobile application that asked me to create a 4-digit PIN. So I typed 4 random digits, and the application said: "nope, you cannot use the same digit multiple times in the PIN, that is not secure". So I typed 4 random digits again, and the application said: "nope, you cannot use an ascending sequence of digits, that is not secure". So I typed 4 random digits yet again, and this time the application was finally satisfied. But it felt funny that the more "security checks" the application uses, the more limited is the choice of possible PINs. There are only 10000 possible combinations of 4 digits to start with; I wonder how far an overzealous security department could reduce that number. In a hypothetical extreme case, we would be left with only one possible choice of PIN -- certainly the most secure one that no one could possibly guess! The holy grail of information security.

*

Okay, back to the sequences of bits. Imagine that we are trying to create not just any random sequence, but the single most random, most unpredictable sequence ever! Suppose the length of the sequence is not specified in advance, so we just keep generating bits one by one, for as long as necessary.

What we could do is create a predictor -- an algorithm that looks at the previously generated bits, tries to find all possible patterns in them and predict the most likely following bit -- and then actually output the opposite. Keep doing this for every bit.

(To make the output fully deterministic, we add an extra rule that if the predictor assigns exactly 50% probabilities of the next bit being 0 or 1, we output 0.)

What would the generated sequence look like?

We need to make a more formal description of the algorithm, because the reasoning is going to get more complicated at each step, since we are trying to defy expectations.

The standard solution for the predictor would be to use the Solomonoff prior (take all possible algorithms that generate sequences of bits, and assign to each of them an initial probability 1/2^ the length of the algorithm), after each observed bit remove the algorithms that were falsified by the observed data, and then calculate the total probability of the remaining algorithms that predict that 0 would be next, versus the total probability of the remaining algorithms that predict that 1 would be next.

This solution is (1) uncomputable, and (2) depends on the exact details of the language used to write the algorithms. But, from the mathematical perspective (once we specify the language for writing the algorithms) it is a formal definition of the most unpredictable sequence.

Yes, it sounds ironic that we have a formal definition of something "unpredictable", but the definition is uncomputable, which means that no algorithm can actually predict the sequence, we just have its mathematical definition.

So I guess we won't find the following bits of the sequence. At least, not arbitrarily many. Perhaps we could still find two or three more, but each of them would require more complicated reasoning.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

反归纳序列 随机性 压缩 预测 Solomonoff先验
相关文章