Zeroth Principles of AI 2024年12月07日
SuperHuman Answer Generators
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

进化计算常被低估,但它是通往人工智能的基石.本文介绍了超人应答机(SHAG)的概念,即能解决人类无法解决的问题的计算机系统.大型语言模型(LLM)是SHAG的一种,而遗传算法(GA)等也是SHAG.GA通过模拟自然选择,利用适应度函数、交叉和突变操作,在群体中迭代优化解.相比于LLM,GA在特定问题上可能更高效.理解GA有助于我们理解LLM如何解决人类无法解决的问题,例如蛋白质折叠.

💡超人应答机(SHAG)定义:SHAG是一种基于计算机的系统,它能为使用者或程序员无法(或懒得)计算答案的问题提供答案.简而言之,SHAG生成人类无法生成的答案.

🧬遗传算法(GA)作为SHAG:除了当前的LLM,遗传算法(GA)、遗传编程(GP)和模拟退火(SA)都是SHAG.GA通过定义个体、种群、适应度函数、交叉函数和变异函数来模拟进化过程.

🔄GA的迭代过程:循环直到系统停止改进,通过计算每个个体的适应度,对种群进行排序,并用优秀个体的交叉后代替换较差的个体,保留精英个体,不断迭代优化.

⚙️GA的设计要点:突变的重要性远不如交叉,交叉操作需要保留父代的部分成功特征.通过比较完整GA与随机搜索的收敛速度,可以测试GA的有效性.

🚀GA的速度与应用:现代CPU的高速运行使得GA可以快速迭代.GA适用于参数众多、结果复杂且难以找到最优解的情况,但在评估个体基因组的优劣方面相对廉价.

Evolutionary Computation is under-appreciated. Objections include "Evolution takes millennia" and "I don't get the point".

They are (or should be) very important to people in the AI community because they are a primitive precursor to intelligence. Understanding Evolutionary Computation (EC) will make understanding LLMs easier.

In order to not have to define "AI" again, I will now define a new term SHAG which stands for "SuperHuman Answer Generator".

A SHAG is a computer based system which can provide answers to questions that the client or programmer that is using the system cannot (or cannot be bothered to) compute an answer to themselves. Or shorter: A SHAG generates answers that humans cannot generate.

Unsurprisingly, all LLMs are SHAGs. They can generate answers that their programmers could not. Such as understand and reply in Finnish.

To many, it is more surprising that all SHAGs (and hence all LLMs) are Holistic. Since SHAGs are Holistic we can identify several problems shared by all Holistic systems (discussed at length in my Red Pill):

- The answers provided may not be correct
- The answers provided are not known to be optimal, complete, repeatable, parsimonious, explainable, or transparent.

As evidence, we recognize these problems as endemic to current LLMs.

But are there SHAGs that are not LLMs? Besides current LLMs (including my own Deep-Discrete-Neuron-Network LLMs), Genetic Algorithms (GA), Genetic Programming (GP), and simulated annealing (SA) are SHAGs. I will mainly discuss GAs in what follows.

My favorite way of using GAs:

- Define an individual that defines a solution but has to compete for survival, initialized randomly for diversity.
- Create a population of those and store all of them in an Array. Say, 1000 individuals in the population.
- Define a goal function that returns a number indicating how good (fit) the individual is.
- Define an crossover function that breeds two successful individuals together, hoping to create an even better offspring
- Define a mutation function that reintroduces more diversity into some individuals.

Loop until system stops improving, which is detectable by noticing that the elite is not changing. This might take as little as 10 cycles for well behaved problems:

- Compute fit for each individual using the goal function
- Sort the array by fitness of the individuals
- Start with the worst individual and move towards better ones:
- Replace the individual with crossover of two superior individuals
- Stop replacements a bit below the top (to preserve the "elite")

Design Overview

GAs all use individuals containing a genome of some sort. Part of the design challenge for the crossover and the goal function is that the practitioner needs to understand the problem well enough to determine not only which individual (viewed as a solution) is better, but also to be able to determine the parameters that comprise all solutions.

There really is not a Phenotype in simple GAs. We evaluate the genotype directly using the goal function. This is a pretty radical shortcut but one that is actually starting to get used in wetlab genomics: Labs make grains with better yields without the bother to grow the seeds for a year, because they know what a higher yield DNA genome looks like.

Suppose you wanted to use GA to optimize shipping cost to design a square cornered box big enough for 200lbs of grain (with a known average density) as cheaply as possible. The genome would contain the X, Y, and Z sizes of the box and those would initially be totally randomized in each individual. The goal function returns zero fitness for every box with insufficient volume for the required amount of grain and otherwise returns the length and circumference so that we can compute the shipping cost the traditional way.

The crossover function might take two parents and use X from one and Y and Z from the other, and sometimes X and Y from one and Z from the other. All parents have better fit than the replaced individual had; we are hoping recombination produces an even better offspring by b enefiting from partial solutions in the parents.

Very rapidly we will note that the best boxes in each generation become smaller and smaller and cheaper to ship. When the Elite is stable, they all have the same X, Y, and Z which is the optimal solution.

Now consider a larger problem with 500 numerical parameters and a goal function that uses every single one of them. This may be expensive to evolve, but if it is the only way forward, we'll take it. Well-behaved problems will converge rapidly.

A typical individual would keep those 500 values in an array (much like genes in a chromosome), and crossover would brutally create the new individual using “DNA” segments from one of the parent alternating with segments from the other parent at one or more randomly chosen cut points.

Some Design Details

    Mutation is MUCH LESS important than crossover to the point of being optional. Beginners get this backwards and even textbooks fail to emphasize this enough. If you are not using crossover, then you are just using random search and are discarding the entire point and power of GA.
    In a population of 1000 I might use an elite of 10 and will therefore replace 990 worst individuals with potentially superior offspring each turn through loop. I generally apply mutation to at most a couple percent of all individuals.

    There has to be some chance for the offspring to inherit some feature(s) of what made the parent(s) successful. I've seen beginners make crossover functions that mistakenly discard all history from the parents and hence the system degrades to random search. This matters for things like deciding on cut points (when using array representations), where some properties will tend to travel together for synergy reasons.

    We can turn this mistake into a measurement. In order to test that your GA works, compare convergence of your full GA with convergence of a version where you replace the crossover function with just creation of a new random (starting point) individual without history from the parents. Now you have a random search system. If it isn't significantly slower than your fully functioning GA, then you need to go back to the drawing board. It will also show you how much an EV can speed things up over linear or random search (those two are equivalent).

GAs Are Not Slow

We can now deal with the objection that "Evolution takes millennia". It does, in nature, especially if you only get one chance to create offspring per year. Computers do it faster.

A modern CPU runs at 3GHz -- 3E9 cps, which is 3,000,000,000 clock cycles per second.

Suppose we have a problem where computing the goal function value takes 1000 clock cycles per individual. This is often generous. A GA with a population of 1000 can therefore run 3,000 generations per second... per thread. If we are in a hurry, we can use multithreading and there are special versions of GA frameworks that can run on a cloud.

So speed is not the issue.

What’s the point?

As to the other objection: The point of SHAGs is that they can provide answers problems humans can't solve, including problems without reliable and complete input data, and NP-Hard and NP-Complete problems such as knapsack problems. These are discussed in The Red Pill.

GAs shine in situations where many parameters influence the outcome in complicated (even complex) ways, and where nobody knows how to find an optimal answer, but where we can rather cheaply determine how good the answer represented by any individual’s genome is.

If you are in this situation, you might do well to explore Holistic Methods, and you need to know that a GA may sometimes be an option. A GA may be a million times cheaper than an LLM for many matched tasks, which becomes economically important for frequently occurring problems.

So SHAGs in computers are LLMs and other (future?) AIs, GA, GP, and SA. But the biggest SHAG of all is Darwinian Evolution of Species in Nature. Humans certainly could not make a platypus from scratch, but Evolution did. More about this in the next article.

Bottom line

In order to understand how LLMs are solving problems we humans cannot solve ourselves (e.g. Protein Folding) we should first study the simpler case of Genetic Algorithms.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

进化计算 超人应答机 遗传算法 大型语言模型 人工智能
相关文章