What does 10^25 versus 10^26 mean?

A brief look at what FLOPs-based regulation nets out to

Recent AI regulations have defined the trigger points for oversight in terms of the amount of floating point operations dumped into training an AI system. If you’re in America and you’ve trained a model with 10^26 FLOPs, you’re going to spend a lot of time dealing with government agencies. If you’re in Europe and you’ve trained a model with 10^25 FLOPs, you’re going to spend a lot of time dealing with government agencies.

More details:

In the United States, the recent Biden Executive Order on AI says that general-purpose systems trained with 10^26 FLOPs (or ones predominantly trained on biological sequence data and using a quantity of computing power greater than 10^23) fall under a new reporting requirement that means companies will let the US government know about these systems and also show work on testing these systems.

In Europe, the recent EU AI Act says that general-purpose systems trained with 10^25 FLOPs have the potential for “systemic risk” and that people who develop these models “are therefore mandated to assess and mitigate risks, report serious incidents, conduct state-of-the-art tests and model evaluations, ensure cybersecurity and provide information on the energy consumption of their models.”

Given how difficult the task of assessing AI systems is, these thresholds matter – governments will need to staff up people who can interpret the results about models which pass these thresholds.

What is the difference between 10^25 versus 10^26 FLOPs in terms of money?

Let’s say you wanted to train an AI system – how much money would you spend on the compute for training the system before you hit one of these thresholds? We can work this out:

NVIDIA H100 – NVIDIA’s latest GPU.

Assumptions:
Using FP8 precision – various frontier labs (e.g, Inflection) have trained using FP8
40% efficiency – assuming you’ve worked hard to make your training process efficient. E.g., Google claims ~46% for PALM 540B
$2 per chip hour – assuming bulk discounts from economies-of-scale.
Training a standard Transformer-based, large generative model.

10^26
Flops per chip second = 2000e12* × 0.4 = 8E14
Flops per chip hour = flops per chip s × 60 (seconds per minute) × 60 (minutes per hour) = 2.88E18
chip h = 1e26 / flops per chip h = 34.722M
chip h × $2 = $69.444M

*3958 TFLOPS (for fp8 with sparsity) on H100 SXM divided by 2 (because the 2x sparsity support generally isn’t relevant for training), so the right number is 1979e12. But the datasheet doesn’t have enough information to tell you that; you just have to know!

10^25
Flops per chip second = 2000e12 × 0.4 = 8E14
Flops per chip hour = flops per chip s × 60 (seconds per minute) × 60 (minutes per hour) = 2.88E18
chip h = 1e26 / flops per chip h = 3.47M
chip h × $2 = $6.94M

NVIDIA A100 – NVIDIA’s prior generation GPU, which lots of labs have lots of.

Assumptions:
Using BF16 precision (A100s don’t have FP8 support, so you’d probably use BF16)
60% efficiency (Anecdata)
0.80$ per chip hour

A100-hrs = 1e26 / (312e12 * 0.6 * 3600) = 1.5e8
Cost = A100-hrs * 0.8 = $119M

What this means in practice:

Anyone who works in AI knows that a training run probably doesn’t work perfectly, so we should times these numbers by 1.5 to factor in some bugs, cluster problems, general screwups, and so on. This means we can arrive at these numbers:

10^25 = $6.94m * 1.5 = $10.4m
10^26 = $69.444M * 1.5 = $104m

Some thoughts on thresholds and the difficulty of regulatory scope and testing:

Both the US and EU regulatory regimes are oriented around the notion that systems which fall above their respective compute thresholds need to go through some intensive testing. In the US, there are very few companies that have likely spent $100m on a single big training run, though there will probably be some. By comparison, there are many companies that have spent more than $10m on a training run – including European ones like Mistral whose recent Mistral-Large model (I’m guessing) likely came in at above this.

Therefore, 10^25 as a threshold seems like it probably hits more companies than regulators anticipate – my prediction is that the EU will end up needing to regulate far more companies/AI systems than it anticipated it’d need to when it drafted the law.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签