Transformers Pipeline 文本情感分类

大家好，常用huggingface的同学们都知道，pipline自动下载模型，这模型，都是从huggingface网站下载，鉴于目前被😈制裁，没有办法访问，只能通过曲线救国，通过镜像站实现模型下载。下面以文本情感分类为例讲述。

Transformers Pipeline 文本分类示例

这个项目展示了如何使用Hugging Face Transformers库的pipeline功能进行简单的文本情感分类。该示例使用了预训练的DistilBERT模型，可以快速对文本进行积极/消极情感的分类。

环境要求

Python 3.7+transformerstorch (PyTorch)

安装说明

创建并激活虚拟环境（推荐）：

# 创建虚拟环境python -m venv venv# 激活虚拟环境# Windowsvenv\Scripts\activate# macOS/Linuxsource venv/bin/activate

安装依赖：

pip install transformers torch

Hugging Face镜像站设置技巧

为了加速模型和tokenizer的下载，可以使用以下方法配置国内镜像：

方法一：使用环境变量（推荐）

# Linux/macOSexport HF_ENDPOINT=https://hf-mirror.com# Windows (CMD)set HF_ENDPOINT=https://hf-mirror.com# Windows (PowerShell)$env:HF_ENDPOINT = "https://hf-mirror.com"

方法二：在代码中设置

from huggingface_hub import set_endpointset_endpoint("https://hf-mirror.com")

方法三：创建配置文件（永久生效）

# 创建配置文件mkdir -p ~/.huggingfaceecho '{"endpoint": "https://hf-mirror.com"}' > ~/.huggingface/config.json

常用的镜像站点：

hf-mirror.com

huggingface.modelscope.cn

代码说明

text-classification.py 文件包含了一个简单的文本分类示例：

from transformers import pipeline# 初始化文本分类pipelineclassifier = pipeline("text-classification",                      model="distilbert-base-uncased-finetuned-sst-2-english")# 准备要分类的文本text = "I love using Hugging Face Transformers! It's amazing!"# 执行预测results = classifier(text)# 打印结果print(results)

代码解析

模型说明：

distilbert-base-uncased-finetuned-sst-2-english

Pipeline 功能：

pipeline()

输出格式：

label

score

label

score

使用示例

运行代码：

python text-classification.py

预期输出：

[{'label': 'POSITIVE', 'score': 0.9998}]

自定义文本：

# 修改 text 变量为你想要分析的文本text = "Your text here"

常见问题解决

模型下载速度慢

使用上述镜像站设置确保网络连接稳定可以手动下载模型文件并放置在缓存目录

CUDA相关错误

确保安装了正确版本的 PyTorch检查 CUDA 版本兼容性

# 查看 PyTorch 是否正确使用 CUDApython -c "import torch; print(torch.cuda.is_available())"

内存不足

使用更小的模型（如当前使用的 DistilBERT）减小 batch_size使用 CPU 版本的 PyTorch

模型加载错误

检查网络连接清除缓存后重试：

from transformers import pipelinepipeline.cache_clear()

进阶使用

批量处理

texts = [    "I love this!",    "This is terrible.",    "Not bad at all."]results = classifier(texts)

设置阈值

# 只显示置信度超过0.9的结果results = classifier(text, threshold=0.9)

使用其他预训练模型

# 使用其他情感分析模型classifier = pipeline("text-classification",                      model="nlptown/bert-base-multilingual-uncased-sentiment")

参考资源

Hugging Face Transformers 文档

DistilBERT 模型说明

Pipeline API 文档