10分钟实现基于Ubuntu25.04本地推理ERNIE模型

零、硬件介绍

先前一直没电脑，手头的macbook怎么也跑不起来，看大家都用上了，我就赶紧回家，翻出旧台式电脑，开始安装。这是安装完ubuntu后的系统截图，配置如图，有一块英伟达的3060显卡，以及64Gb内存，差不多可以了吧。

一、环境准备

1.系统准备

二话不说，直接格式化掉win11,U盘g安装ubuntuunb最新版 ubuntu25.04版本，话不多说，直接装即可。内核信息如下：

(p3) livingbody@gaint:~$ uname -aLinux gaint 6.14.0-23-generic #23-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun 13 23:02:20 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

2.conda环境准备

下载minniconda安装包打开清华源：mirrors.tuna.tsinghua.edu.cn/anaconda/mi… ，选择最新安装包下载

安装miniconda给安装包赋予in执行权限，然后安装，ig命令你如下所示：

chmod +x Downloads/Miniconda3-py39_4.9.2-Linux-x86_64.sh ./Downloads/Miniconda3-py39_4.9.2-Linux-x86_64.sh

gitcode.com/gh_mirrors/…

wget https://tuna.moe/oh-my-tuna/oh-my-tuna.pypython oh-my-tuna.py

3.推理环境创建

python环境创建

conda create -n p3 python=3.12conda activate p3

gpu版本paddlepaddle安装

python -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/python -c "import paddle;paddle.utils.run_check()"

这种模式省去了自己下载、安装cuda、cudnn的繁琐程序，a极为推进，g网速够快1分钟即可完成安装。

fastdeploy安装

python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-86_89

如果提示没有合适的包，可以打开www.paddlepaddle.org.cn/packages/st…直接下载并强制安装，亲测可行。

二、模型下载、加载、测试

1.模型下载、加载

在终端执行下列命令，即可下载并加载模型

python -m fastdeploy.entrypoints.openai.api_server \       --model baidu/ERNIE-4.5-0.3B-Paddle \       --port 8180 \       --metrics-port 8181 \       --engine-worker-queue-port 8182 \       --max-model-len 32768 \       --max-num-seqs 32

下载模型保存于PaddlePaddle/ERNIE-4.5-0.3B-Paddle路径下：

(base) livingbody@gaint:~$ ls PaddlePaddle/ERNIE-4.5-0.3B-Paddle/ -la总计 706276drwxrwxr-x 3 livingbody livingbody      4096 Jul  6 16:04 .drwxrwxr-x 3 livingbody livingbody      4096 Jul  6 16:39 ..-rw-rw-r-- 1 livingbody livingbody     23133 Jul  6 16:04 added_tokens.json-rw-rw-r-- 1 livingbody livingbody       556 Jul  6 16:04 config.json-rw-rw-r-- 1 livingbody livingbody       125 Jul  6 16:04 generation_config.json-rw-rw-r-- 1 livingbody livingbody     11366 Jul  6 16:04 LICENSE-rw-rw-r-- 1 livingbody livingbody 721508576 Jul  6 16:04 model.safetensors-rw------- 1 livingbody livingbody       658 Jul  6 16:04 .msc-rw-rw-r-- 1 livingbody livingbody        67 Jul  6 16:18 .mv-rw-rw-r-- 1 livingbody livingbody      7690 Jul  6 16:04 README.md-rw-rw-r-- 1 livingbody livingbody     15404 Jul  6 16:04 special_tokens_map.jsondrwxrwxr-x 2 livingbody livingbody      4096 Jul  6 16:04 ._tmp-rw-rw-r-- 1 livingbody livingbody      1248 Jul  6 16:04 tokenizer_config.json-rw-rw-r-- 1 livingbody livingbody   1614363 Jul  6 16:04 tokenizer.model

2.模型调用

启动后，给出下列连接，可供调用。

INFO     2025-07-06 16:05:14,001 11789 engine.py[line:276] Worker processes are launched with 15.871807098388672 seconds.INFO     2025-07-06 16:05:14,001 11789 api_server.py[line:91] Launching metrics service at http://0.0.0.0:8181/metricsINFO     2025-07-06 16:05:14,002 11789 api_server.py[line:94] Launching chat completion service at http://0.0.0.0:8180/v1/chat/completionsINFO     2025-07-06 16:05:14,002 11789 api_server.py[line:97] Launching completion service at http://0.0.0.0:8180/v1/completions

通过url调用，api_key无。

import openaihost = "0.0.0.0"port = "8180"client = openai.Client(base_url=f"http://{host}:{port}/v1", api_key="null")response = client.chat.completions.create(    model="null",    messages=[        {"role": "system", "content": "You are a very usefull assistant."},        {"role": "user", "content": "Please talk about the SUN"},    ],    stream=True,)for chunk in response:    if chunk.choices[0].delta:        print(chunk.choices[0].delta.content, end='')print('\n')

零、硬件介绍

一、环境准备

1.系统准备

2.conda环境准备

3.推理环境创建

二、模型下载、加载、测试

1.模型下载、加载

2.模型调用

三、难点总结

1.确确实实需要一块显卡，没有显卡很难搞；

2.使用官网安装说明，可以直接通过pip安装cuda、cudnn,省掉了很多麻烦；

3.使用fastdeploy部署非常省力，连模型下载都自动完成了，亲测速度很快，能跑满网速。

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签