掘金 人工智能 4小时前
在文本分类任务上,Qwen3-0.6B真的比Bert效果好么?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文对比了Qwen3-0.6B和Bert在AG_News数据集上的文本分类性能。实验结果表明,Qwen3-0.6B在不同配置下表现各异,其中线性层分类效果最佳,优于Bert和SFT分类。同时,文章还分析了模型的训练耗时和推理速度,并探讨了Think模式对Qwen3-0.6B的影响。实验结果为小模型在文本分类任务中的应用提供了参考。

🚀 实验对比了Bert和Qwen3-0.6B两种模型在AG_News数据集上的文本分类效果,旨在探究小模型在文本分类任务中的潜力。

📊 实验结果显示,Qwen3-0.6B通过线性层分类的方式在测试集上取得了0.949的F1值,优于Bert的0.945和SFT分类的0.941,展现了其在文本分类任务中的竞争力。

⏱️ 对比了各模型的训练和推理耗时。Qwen3-0.6B的SFT分类训练耗时最长,而Bert的推理速度最快,为实时业务场景提供了参考。

🧠 分析了Qwen3-0.6B在Think和No Think模式下的表现。Think模式下的准确率略高于No Think模式,但推理时间显著增加。

Changelog

前言

最近在知乎上刷到一个很有意思的提问Qwen3-0.6B这种小模型有什么实际意义和用途。查看了所有回答,有人提到小尺寸模型在边缘设备场景中的优势(低延迟)、也有人提出小模型只是为了开放给其他研究者验证scaling lawQwen2.5系列丰富的模型尺寸为开源社区验证方法有效性提供了基础)、还有人说4B、7B的Few-Shot效果就已经很好了甚至直接调用更大的LLM也能很好的解决问题。让我比较感兴趣的是有大佬提出小模型在向量搜索、命名实体识别(NER)和文本分类领域中很能打,而另一个被拿来对比的就是Bert模型。在中文文本分类中,若对TextCNNFastText效果不满意,可能会尝试Bert系列及其变种(RoBerta等)。但以中文语料为主的类Encoder-Only架构模型其实并不多(近期发布的ModernBERT,也是以英文和Code语料为主),中文文本分类还是大量使用bert-base-chinese为基础模型进行微调,而距Bert发布已经过去了6年。Decoder-Only架构的LLM能在文本分类中击败参数量更小的Bert吗?所以我准备做一个实验来验证一下。

不想看实验细节的,可以直接看最后的结论实验局限性部分。

实验设置

模型参数量训练方式
google-bert/bert-base-cased0.1B添加线性层,输出维度为分类数
Qwen/Qwen3-0.6B0.6B构造Prompt,SFT
{  "text":"New iPad released Just like every other September, this one is no different. Apple is planning to release a bigger, heavier, fatter iPad that..."  "label":3  }

Bert训练细节

参数名称
lr_scheduler_type(学习率衰减策略)cosine
learning_rate(学习率)1.0e-5
per_device_train_batch_size(训练batch_size)64
gradient_accumulation_steps(梯度累积)1
per_device_eval_batch_size(验证batch_size)256
num_train_epochs(epoch)3
weight_decay1e-6
eval_steps(验证频率)0.05
StepTraining LossValidation LossAccuracyPrecisionRecallF1
2820.2747000.2633940.9097370.9103110.9097370.909676
5640.2078000.2222300.9222370.9227010.9222370.922246
8460.1996000.2042220.9315790.9325520.9315790.931510
11280.2156000.1918240.9346050.9352740.9346050.934737
14100.1905000.1928460.9327630.9344210.9327630.932937
16920.1933000.1806650.9378950.9389410.9378950.937849
19740.1430000.1804970.9405260.9409450.9405260.940636
22560.1415000.1776300.9417110.9419880.9417110.941644
25380.1471000.1736020.9439470.9440220.9439470.943908
28200.1316000.1768950.9406580.9417900.9406580.940683
31020.1528000.1709280.9450000.9451400.9450000.944925
33840.1400000.1692150.9444740.9447660.9444740.944399
36660.1499000.1688650.9444740.9445380.9444740.944483
39480.1120000.1724590.9461840.9461420.9461840.946159
42300.1240000.1728260.9450000.9452540.9450000.944924
45120.1223000.1715830.9447370.9449250.9447370.944708
47940.1044000.1719690.9448680.9450590.9448680.944854
50760.1175000.1715040.9453950.9455020.9453950.945363
53580.0998000.1717610.9452630.9455100.9452630.945232

Qwen3训练细节

线性层分类

参数名称
lr_scheduler_type(学习率衰减策略)cosine
learning_rate(学习率)1.0e-5
per_device_train_batch_size(训练batch_size)8
gradient_accumulation_steps(梯度累积)8
per_device_eval_batch_size(验证batch_size)16
num_train_epochs(epoch)1
weight_decay1.0e-6
eval_steps(验证频率)0.05
StepTraining LossValidation LossAccuracyPrecisionRecallF1
940.2818000.2436190.9181580.9181800.9181580.917893
1880.2241000.2200150.9242110.9252160.9242110.924289
2820.1977000.2364050.9192110.9201270.9192110.919257
3760.1828000.2432350.9201320.9253680.9201320.919136
4700.1915000.2078640.9282890.9295630.9282890.928304
5640.2084000.1924140.9356580.9356680.9356580.935647
6580.2019000.1915060.9385530.9386950.9385530.938607
7520.1919000.1798490.9375000.9374170.9375000.937378
8460.1561000.1773190.9386840.9389830.9386840.938653
9400.1599000.1770480.9382890.9394330.9382890.938175
10340.1591000.1722800.9435530.9437250.9435530.943455
11280.1170000.1687420.9430260.9429110.9430260.942949
12220.1515000.1646280.9434210.9443710.9434210.943503
13160.1436000.1586760.9459210.9468560.9459210.945965
14100.1832000.1543560.9461840.9467080.9461840.946221
15040.1594000.1535490.9477630.9478470.9477630.947771
15980.1471000.1525300.9485530.9486090.9485530.948539
16920.1614000.1512990.9490790.9492160.9490790.949029
17860.1505000.1512700.9484210.9485720.9484210.948363

SFT分类

prompt="""Please read the following news article and determine its category from the options below.    Article:  {news_article}    Question: What is the most appropriate category for this news article?  A. World  B. Sports  C. Business  D. Science/Technology    Answer:/no_think"""    answer="<think>\n\n</think>\n\n{answer_text}"
模型思考非思考
Qwen3-0.6B0.79970.7898
{  'instruction':"Please read the following news article and determine its category from the options below.\n\nArticle:\nWall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindling\\band of ultra-cynics, are seeing green again.\n\nQuestion: What is the most appropriate category for this news article?\nA. World\nB. Sports\nC. Business\nD. Science/Technology\n\nAnswer:/no_think",  'output':'<think>\n\n</think>\n\nC'  }
### model  model_name_or_path:model/Qwen3-0.6B    ### method  stage:sft  do_train:true  finetuning_type:full    ### dataset  dataset:agnews_train  template:qwen3  cutoff_len:512    overwrite_cache:true  preprocessing_num_workers:8    ### output  output_dir:Qwen3-0.6B-Agnews  save_strategy:steps  logging_strategy:steps  logging_steps:0.01  save_steps:0.2  plot_loss:true  report_to:tensorboard  overwrite_output_dir:true    ### train  per_device_train_batch_size:12  gradient_accumulation_steps:8  learning_rate:1.2e-5  warmup_ratio:0.01  num_train_epochs:1  lr_scheduler_type:cosine  bf16:true
StepTraining LossAccuracyPrecisionRecallF1
2500.0260.9120.9170.9120.912
5000.0270.9240.9240.9240.924
7500.0220.9370.9370.9370.937
10000.0220.9410.9410.9410.941
12500.0230.9400.9400.9400.940

Bert和Qwen3-0.6B训练耗时

模型Epoch训练耗时推理耗时总耗时
Bert335 min-0.58 h
Qwen3-0.6B(线性层分类)152 min-0.86 h
Qwen3-0.6B(SFT分类)162 min30 min1.5 h

Bert和Qwen3-0.6B RPS测试

模型推理引擎最大输出Token数RPS
BertHF-60.3
Qwen3-0.6B(SFT分类)HF813.2
Qwen3-0.6B(SFT分类)VLLM827.1
Qwen3-0.6B(线性层分类)HF-38.1

结论

实验局限性

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Qwen3-0.6B Bert 文本分类 AG_News 模型对比
相关文章