零、硬件介绍
先前一直没电脑,手头的macbook怎么也跑不起来,看大家都用上了,我就赶紧回家,翻出旧台式电脑,开始安装。这是安装完ubuntu后的系统截图,配置如图,有一块英伟达的3060显卡,以及64Gb内存,差不多可以了吧。
一、环境准备
1.系统准备
二话不说,直接格式化掉win11,U盘g安装ubuntuunb最新版 ubuntu25.04版本,话不多说,直接装即可。内核信息如下:
(p3) livingbody@gaint:~$ uname -aLinux gaint 6.14.0-23-generic #23-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun 13 23:02:20 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
2.conda环境准备
下载minniconda安装包打开清华源:mirrors.tuna.tsinghua.edu.cn/anaconda/mi… ,选择最新安装包下载
安装miniconda给安装包赋予in执行权限,然后安装,ig命令你如下所示:
chmod +x Downloads/Miniconda3-py39_4.9.2-Linux-x86_64.sh ./Downloads/Miniconda3-py39_4.9.2-Linux-x86_64.sh
- 设置清华源下载oh-my-tuna.py项目,按说明操作,github不方便可以用gitcode:gitcode.com/gh_mirrors/…
wget https://tuna.moe/oh-my-tuna/oh-my-tuna.pypython oh-my-tuna.py
3.推理环境创建
- python环境创建
conda create -n p3 python=3.12conda activate p3
- gpu版本paddlepaddle安装
python -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/python -c "import paddle;paddle.utils.run_check()"
这种模式省去了自己下载、安装cuda、cudnn的繁琐程序,a极为推进,g网速够快1分钟即可完成安装。
- fastdeploy安装
python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-86_89
如果提示没有合适的包,可以打开www.paddlepaddle.org.cn/packages/st…直接下载并强制安装,亲测可行。
二、模型下载、加载、测试
1.模型下载、加载
在终端执行下列命令,即可下载并加载模型
python -m fastdeploy.entrypoints.openai.api_server \ --model baidu/ERNIE-4.5-0.3B-Paddle \ --port 8180 \ --metrics-port 8181 \ --engine-worker-queue-port 8182 \ --max-model-len 32768 \ --max-num-seqs 32
下载模型保存于PaddlePaddle/ERNIE-4.5-0.3B-Paddle路径下:
(base) livingbody@gaint:~$ ls PaddlePaddle/ERNIE-4.5-0.3B-Paddle/ -la总计 706276drwxrwxr-x 3 livingbody livingbody 4096 Jul 6 16:04 .drwxrwxr-x 3 livingbody livingbody 4096 Jul 6 16:39 ..-rw-rw-r-- 1 livingbody livingbody 23133 Jul 6 16:04 added_tokens.json-rw-rw-r-- 1 livingbody livingbody 556 Jul 6 16:04 config.json-rw-rw-r-- 1 livingbody livingbody 125 Jul 6 16:04 generation_config.json-rw-rw-r-- 1 livingbody livingbody 11366 Jul 6 16:04 LICENSE-rw-rw-r-- 1 livingbody livingbody 721508576 Jul 6 16:04 model.safetensors-rw------- 1 livingbody livingbody 658 Jul 6 16:04 .msc-rw-rw-r-- 1 livingbody livingbody 67 Jul 6 16:18 .mv-rw-rw-r-- 1 livingbody livingbody 7690 Jul 6 16:04 README.md-rw-rw-r-- 1 livingbody livingbody 15404 Jul 6 16:04 special_tokens_map.jsondrwxrwxr-x 2 livingbody livingbody 4096 Jul 6 16:04 ._tmp-rw-rw-r-- 1 livingbody livingbody 1248 Jul 6 16:04 tokenizer_config.json-rw-rw-r-- 1 livingbody livingbody 1614363 Jul 6 16:04 tokenizer.model
2.模型调用
启动后,给出下列连接,可供调用。
INFO 2025-07-06 16:05:14,001 11789 engine.py[line:276] Worker processes are launched with 15.871807098388672 seconds.INFO 2025-07-06 16:05:14,001 11789 api_server.py[line:91] Launching metrics service at http://0.0.0.0:8181/metricsINFO 2025-07-06 16:05:14,002 11789 api_server.py[line:94] Launching chat completion service at http://0.0.0.0:8180/v1/chat/completionsINFO 2025-07-06 16:05:14,002 11789 api_server.py[line:97] Launching completion service at http://0.0.0.0:8180/v1/completions
通过url调用,api_key无。
import openaihost = "0.0.0.0"port = "8180"client = openai.Client(base_url=f"http://{host}:{port}/v1", api_key="null")response = client.chat.completions.create( model="null", messages=[ {"role": "system", "content": "You are a very usefull assistant."}, {"role": "user", "content": "Please talk about the SUN"}, ], stream=True,)for chunk in response: if chunk.choices[0].delta: print(chunk.choices[0].delta.content, end='')print('\n')