原创 张粲宇、高超 2025-04-08 18:28 上海
如何在精度和性能之间找到最佳平衡?
前言
近一年多来,大模型落地产业已经成为了主旋律,而向量数据库也成为自动驾驶、企业知识库、推荐系统、商品识别等场景中的标准配置。
但实际应用中,我们会发现并不存在所谓的一招鲜方法。
比如,在向量检索的过程中,召回率(Recall)和延迟(Latency)分别对应着系统的精度和性能,共同决定了系统搜索能力的边界。但实际落地中,这两者往往难以两全。
举个例子,同样是用向量数据库做语义检索,电商推荐往往需要能应对高并发,但对召回精度要求不高;而人脸识别,则要求更高的精度,可以容许一定的延迟。
那么,如何在精度和性能之间找到最佳平衡?
近日,Zilliz Cloud 发布最新功能:level
参数调节搜索精度和 enable_recall_calculation
返回精度预估,为解决这一难题提供了强有力的支持。
接下来,本文将通过具体场景和示例代码,探讨如何充分利用这些工具,帮助用户在召回率(Recall)和性能之间实现最佳的平衡。
01
不同用户场景下,Recall 和 Latency/QPS 的需求大不同
随着 AI 和大模型的发展,向量检索正被应用到越来越多的场景之中,然而不同场景对 Recall 和 Latency/QPS 的需求却有着很大差异。以下我们通过两个典型的应用场景为例进行分析:
1.1 推荐系统:召回精度要求低,高 QPS 低延迟
推荐系统是一个典型的应用场景,其目标是为用户提供相关性强且多样化的推荐内容。在推荐系统中,召回率通常要求不高,原因在于推荐系统的工作方式不同于传统的分类任务。推荐系统的核心目标是通过分析用户的历史行为和兴趣,快速从海量的候选项中筛选出一些可能感兴趣的内容。尽管有些推荐内容可能不完全契合用户兴趣,但系统的优势在于它能够引入多样性,为用户提供新的选择和发现,最终的点击由用户选择产生。
因此,推荐系统对召回率的要求较低,允许有一定比例的“不相关”内容被推荐。然而,由于推荐系统需要实时处理大量用户请求,其每秒查询数(QPS)需求非常高,必须能够以低延迟快速响应大量并发请求。这种对速度的高要求意味着推荐系统在设计上更注重性能优化和实时性,而对召回率的容忍度相对较高。
1.2 人脸识别:召回精度要求高,低 QPS 延迟宽松
与推荐系统不同,人脸识别系统尤其是在安全验证场景中,对召回率的要求非常高。人脸识别的主要任务是验证用户身份,这在很多应用场景下(如公安、门禁、支付验证等)都具有重要的安全性要求。系统必须能够准确识别出所有授权用户,避免任何合法用户因识别错误而被拒绝访问,或者误识别人为非法用户。
在这种场景下,召回率的高低直接关系到系统的安全性和准确性,因此必须尽可能减少误检和漏检的情况,即需要确保召回率足够高,从而保证所有合法用户才能顺利通过验证。相比之下,虽然人脸识别系统的精确度同样重要,但其 QPS(每秒查询数)需求通常较低,因为用户验证属于低频操作,系统处理请求的总量相对有限。此外,由于视频采集的时间消耗占比较大,检索比对对延迟(Latency)的要求也相对宽松。
上述例子展示了召回率和 QPS 在不同业务场景中的显著差异。而在其他向量检索业务中,例如 RAG(检索增强生成)、音视频搜索,以及化学分子和蛋白质探索等场景,对精度和性能的需求也各不相同。因此,根据具体业务需求因地制宜地选择合适的参数配置显得尤为重要。
02
Zilliz Cloud 支持召回精度调优和精度预估
在最新发布的 Zilliz Cloud 上,我们通过 SDK 提供了两个接口参数,旨在帮助用户更便捷地评估和调整精度设置,从而在召回率(Recall)与性能之间找到最佳平衡。
2.1 Level
参数调节召回精度
在最新版本的 Zilliz Cloud 中,我们进一步优化了搜索精度的支持,通过 level
参数帮助用户灵活调节召回率,以满足不同业务场景的需求。以下是关于 level 参数的详细说明:
较低的 level 参数 (level=1):适用于一般场景(召回率通常在 90% 以上),此时查询性能优异,资源消耗较低,非常适合需要快速响应的业务需求。
较高的 level 参数 (level=10):针对对召回率要求极高的场景,提升搜索精度,但会消耗更多的计算资源和时间,适用于对精准度要求严苛的任务。
用户可以根据具体业务场景,动态调整 level 参数,以平衡召回率和性能之间的关系。需要注意的是,精度并非越高越好。当较低的 level 参数(例如 level=3 或 level=5)已经满足业务需求时,进一步提高 level 只会增加系统资源的消耗,并导致查询延迟的增加。因此,合理选择 level 参数,是在精度、性能和资源消耗之间找到最佳平衡点的关键。
值得一提的是,Zilliz Cloud 在此前版本中已经支持通过 level 参数调整召回率,但在最新版本中进一步扩展了支持范围,将 level 的上限从 5 提升到了 10,为用户提供了更高精度的搜索能力。这一改进为对召回率有极高要求的复杂场景(如风控、安全或高精度匹配)提供了更强大的支持。
2.2 enable_recall_calculation
用于评估召回精度
Zilliz Cloud 提供了 enable_recall_calculation
功能,用于一次性检测和评估召回率:
评估召回率:启用该功能后,Zilliz Cloud 会计算并展示召回率的表现,帮助用户了解系统在特定条件下的搜索准确度。
综合评估:通过将召回率评估结果与其他关键指标(如 QPS、延迟或资源成本)结合分析,用户需要全面衡量系统是否处于最佳状态。
通过这些功能,Zilliz Cloud 为用户提供了灵活的工具,结合多种性能指标,帮助用户在精度、性能和成本之间做出优化决策。
03
实战调优:精度调优手把手实践
3.1 实验环境
最新 Zilliz Cloud 云上创建的实例
pymilvus 2.5.4 客户端
3.2 调优步骤
(1)明确业务目标并导入测试数据
首先,我们需要分析线上查询的模式(Pattern)和参数配置,包括业务对召回率(Recall)的要求、返回的 TopK 数量等。以下是预设的业务关键参数:
查询模式:纯向量搜索
TopK:16000
召回率:99.5%
延迟:5 ms 内
明确业务需求后,导入相应的测试数据集。测试数据集可以是全量数据,也可以是采样子集,但必须确保其数据分布与生产环境近似,以获得更准确的性能预估。本次示例中,我们通过 VDBBench 将 cohere 1M 数据集导入 Zilliz Cloud 作为测试数据。
(2)估算不同level
下的召回率
通过以下代码可估算默认值 level=1
下的召回率。
search_params = {
"params": {
"level"
:
1
,
"enable_recall_calculation"
:
True
}
}
res = client.search(
collection_name = "ZillizCloudVectorDBBench",
# test data
data = [
[0.22252834,0.26758388,0.3414864,0.31775144,0.25819996,-0.06176423,0.60313016,-0.31930527,-0.05070293,0.80085576,-0.7066278,-0.14704825,0.07324219,-0.051405758,0.24823247,0.20365287,-0.005265507,0.24754052,0.06302843,-0.24397966,-0.2805658,0.543768,0.018544307,0.14154078,0.03093845,-0.25058296,-0.61569184,0.08389459,-0.27519965,0.2121497,0.26527727,-0.03291658,-0.2631627,0.026973603,-0.22165383,0.38862047,0.012616088,-0.066382475,-0.013436819,-0.59001106,-0.08682751,0.13704056,0.08583454,0.0802483,0.01096984,0.20214474,0.11156094,0.5482859,-0.0807617,-0.16539982,-0.29261217,0.08269717,0.03385099,-0.48874223,-0.013168857,0.01616468,-0.6270225,0.13169415,1.0166928,0.6573267,0.40487188,0.2235163,-0.68331105,-0.24911362,-0.1763628,0.34692895,0.077760294,0.96388775,-0.10841275,0.3977706,0.08965021,0.29019687,0.106024966,0.11854912,-0.070255764,-0.24960922,-0.5354312,-0.70186704,0.25364435,0.43369204,0.42516047,0.15078346,-0.35151976,0.4886603,0.37026608,0.39485633,-0.046821307,0.18807457,0.13850202,-0.15630293,0.1469321,0.13219471,-0.31201053,0.22278261,-0.23063508,0.42379102,0.66762435,-0.11903275,0.22101034,-0.10102379,0.21675844,-0.2571628,-0.26546052,0.6185626,0.241754,-0.15792729,-0.37045696,0.23996626,-0.27011713,-0.11512929,0.31577003,0.11118326,-0.76019096,-0.1682175,0.5329204,0.3614348,-0.2348311,0.108711846,0.068240434,-1.3124024,0.20385003,0.6049856,-0.2924265,-0.07293733,0.5181119,0.15742214,0.7537572,-0.656809,0.57201636,0.09775318,0.5414663,-0.53804034,-0.0802571,0.62367904,0.023681821,0.6950041,0.3207407,0.36089638,0.53875273,-0.7288189,-0.12956187,0.15076943,0.057313107,0.41065657,-0.00928412,0.090415776,0.18091775,-0.010793066,-0.010142676,-0.20986095,-0.3740831,0.25086942,-0.31494206,0.17761512,0.04850758,-0.06098805,0.7605751,0.3038707,0.5178377,-0.5769507,0.80365956,0.22879237,-0.34868854,-0.2688102,-0.20910782,0.3392469,0.2990533,0.5502763,0.6561665,0.04177933,-0.45408615,-0.055697974,0.05596695,-0.22720425,0.63778013,-0.1921504,-0.16227654,-0.053658817,0.04536426,-0.28570235,0.30350783,0.5217574,0.0025516534,0.10135456,0.5973671,0.09276529,0.7803261,0.45648357,-0.21722879,-0.3496141,0.18574907,0.1729008,0.6754883,-0.5101994,0.16308193,-0.32053986,-0.0013728795,0.0371755,-0.114131995,0.19870742,0.3973309,0.17016156,0.016581284,0.3074155,0.32889378,-0.6682561,0.36933577,0.45571226,-0.19315217,-0.5065343,0.15996625,0.026897952,0.046443015,0.2667398,-0.18946062,-0.4283052,-0.44281873,-0.24062063,0.41703427,-0.30064407,0.35975343,0.31060407,0.18125875,0.14912511,0.10962614,-0.06901708,-0.2846222,-0.027887726,0.037055127,0.031954445,0.56672156,-0.0863331,0.1497875,-0.1635759,-0.25121027,0.6303942,0.17385906,0.4313834,0.15800661,-0.6267578,-0.03539913,0.32520285,0.42759246,0.24401832,0.2115575,-0.8652025,0.13317755,-0.5719402,0.17294376,-0.12595764,0.34818307,0.24802469,0.05904272,0.1538172,-0.57994705,0.2582915,0.45511153,-0.44164076,-0.074042775,0.04943926,-0.1648779,0.3280813,0.5601293,-0.0018850226,-0.140464,-0.07845455,0.44026145,0.56197315,0.1462102,-0.18595229,0.014953136,0.46956787,-0.14819877,0.1859354,0.019512085,-0.01712815,0.5366789,0.7835224,-0.7423546,0.6503855,0.44647282,0.3631722,-0.66614413,0.10151727,-0.1348695,0.32992417,0.10387001,-0.26746857,-0.33413792,-0.5662058,0.36110422,0.7741211,-0.039930806,-0.15249825,0.09454683,0.4891987,0.0062028184,0.06745152,0.55465925,-0.06739082,0.5588079,-0.43696547,0.555966,0.56702,0.056295626,-0.62005293,-0.3722073,0.21030217,-0.017268468,0.95288086,0.51696795,-0.25066343,-0.3169728,0.42543235,0.31396082,0.17551036,0.3922707,0.07407632,0.91187936,0.38888615,-0.12070266,0.011815081,-0.45720986,0.04727247,0.62094855,-0.45443076,0.16062841,-0.40287957,-0.55417335,-0.3830013,0.055438586,0.1718703,-0.6422826,0.22917171,-0.5290951,0.1585279,0.07934802,0.50577295,-0.035466444,0.088082545,0.5693788,0.11773129,0.1821725,0.41347143,-0.2278498,0.50422746,0.29794943,-0.9369089,0.47065943,0.28594327,-0.6866015,-0.63375616,-0.15243755,-0.46409172,-0.4630497,-0.25025153,0.6375835,0.54886156,0.19831929,-0.03725618,0.20592122,0.36338213,0.31409082,-0.05410012,0.14887711,0.09740482,0.05067692,0.14775206,0.28025475,-0.34377113,-0.27423778,-0.354568,-0.20043556,0.3899774,-0.19812085,-0.36292082,-0.18255037,0.07038237,0.642794,-0.060884897,0.2948623,0.68766963,0.6928454,-0.3849391,0.17996079,0.12549743,0.10299729,-0.25861496,-0.09246836,-0.32353002,-0.01378604,-0.095313616,-0.04558251,0.20014873,-0.4066689,-0.08052519,-0.4618455,0.37693843,0.45283204,-0.114583425,0.050728872,0.13196129,-0.1941961,-0.11727777,3.9586966,0.05150596,0.11701303,0.5739518,0.07567582,0.48247826,0.25156844,0.38180268,0.12796494,0.009531708,-0.04081167,-0.30954623,-0.035167653,0.43064785,0.24091315,-0.11113215,0.027972942,0.3501582,0.54151994,0.14281327,-0.6420307,0.48611403,0.5221564,0.47878447,0.8510151,0.5528693,0.27463847,0.7548287,0.760392,0.4057206,0.5247366,0.6815815,0.46189928,-0.0665814,0.29575244,-0.13240056,0.44400057,-0.26975283,-0.15510945,0.15475176,-0.46221858,0.054507546,0.14640503,0.66453534,0.19300742,-0.3626597,-0.16279799,0.3795997,0.122737944,-0.20419496,0.18285695,0.027228406,-0.22584598,-0.16478994,0.28747237,0.53937024,0.44095138,0.6340223,-0.41380823,0.38367343,0.39497304,-0.043954037,0.38885015,-0.33315817,-0.4766579,0.17371525,-0.23392603,0.7948543,0.3054392,-0.72041094,0.2532946,0.415873,0.80443436,-0.34634262,-0.4886025,0.30351955,-0.049782824,-0.47253707,-0.11401102,-0.096243046,0.19083612,-0.34427363,-0.24545296,0.5773733,0.16357873,0.38620606,0.39995435,-0.65907687,0.6957725,0.24120355,0.34054404,-0.039899644,0.80393964,0.06337182,0.14144897,0.117613785,-0.019442292,-3.7490542,-0.38971332,0.14894387,-0.61240107,0.19039957,0.23817067,0.022639165,0.015894404,-0.6198486,0.14320132,0.041371442,-0.30882874,-0.30676636,1.0463533,-0.034157425,0.31748047,0.4891939,0.5333419,-0.3289819,0.14962271,0.2807266,0.35519713,0.4001028,-0.18559772,-0.7066097,0.14664957,-0.565848,0.013109448,-0.18452193,-0.07372118,0.28156808,-0.36035228,0.8867393,-0.16306667,-0.04191513,0.4594507,0.43135175,-0.091903865,-0.042651527,-0.32555583,-0.19054003,-0.06525034,0.16911364,0.04686202,-0.038171988,0.1336097,0.3761719,-0.050084345,-0.2679286,0.64759475,0.7107872,0.2074471,-0.27312976,0.40090975,0.5491712,-0.10747743,0.74496686,0.18130445,-0.09431538,0.19524746,-0.21418755,-0.12488151,0.15227054,-0.3852693,-0.7784234,-0.14571632,0.041122317,-0.16407914,0.03949264,-0.1925929,0.32901394,0.12069722,0.23391949,-0.16763128,-0.12962814,0.5088096,0.21486548,-0.20993523,0.603585,0.24633685,-0.14029086,-0.27401388,-0.49189645,-0.10249644,2.3196032,-0.12417316,2.1448603,0.058190174,-0.6551869,0.6827868,0.6356786,0.7710372,0.3722568,0.8363127,0.3799041,0.26085538,-0.20764771,0.512162,0.08349497,-0.15835808,0.5738307,-0.66654295,0.18993358,0.32188657,0.0764867,0.64592606,-0.2310478,0.18350935,-0.3915338,0.028645294,-0.101273224,0.8696747,-0.50792813,-0.39119712,-0.30162883,0.7319297,0.71813834,0.39383802,-0.012138247,0.3298783,0.23386809,4.5470805,-0.049004212,0.107414484,0.052308656,0.2678271,-0.15366946,0.5438965,-0.47809094,-0.14649442,0.022792917,0.1358324,-0.4503206,0.57014287,-0.13368002,0.23805767,-0.22125027,-0.08700341,0.045676652,0.16678812,-0.27974084,0.45245427,-0.2107062,0.6667994,0.036875203,0.54632777,0.20104687,0.5349449,0.06913179,-0.086024776,0.76876926,0.16203642,5.155099,0.2797164,0.21450946,-0.17529553,-0.038863413,0.5156995,0.08603405,-0.516439,-0.35604522,0.10131945,0.008194211,0.084706515,-0.34049395,0.21572115,-0.83385843,-0.046860088,-0.48247585,0.023293016,0.22008015,-0.5305121,0.5061096,0.0183293,0.1326365,0.22057603,-0.43027383,-0.3885953,0.1500542,-0.1449458,-0.38747045,0.2789606,0.27069542,-0.37978157,-0.58541,0.5139468,-0.60586643,-0.5236463,0.22003366,0.15764758,0.3512009,0.13694952,0.7772281,0.28431293,0.113065295,0.14233269,-0.047996823,0.0024461043,0.06218189,-0.28065726,-0.2061346,-0.36278206,0.24291486,-0.0869041,0.7448049,0.36513415,0.61559093,0.42820337,0.41123256,-0.32082868,-0.10876272,-0.028618973,0.6750199,-0.048880983,-0.12521495,0.1926665,0.6695621,0.21937566,0.46856737,0.30544627,0.2650348,-0.11578811,-0.15696093,-0.047148716,0.19283816,0.12149068,-0.03274016,0.021503512,0.008024155,0.19709297,0.15727529,0.14134975,-0.16997191,-0.063695885,-0.39591065,-0.11891319,-0.04673462,0.16978487,-0.09345571,0.11924938,0.13301763,-0.2266567,0.4164705,0.3571622,0.09038913,0.18044233,0.09119875,-0.23754075,0.45051736,0.35435763,0.20957275,0.5704436,-0.36682,0.26963162,0.15532929,-0.24306794,0.17486432,0.39116114,0.12234816,0.21448524,-0.019066956,-0.09756305,0.4465544,0.3394048,-0.7088385,-0.5032021,0.03529406]
],
limit = 16000,
search_params = search_params
)
print(f"recall: {res.recalls}")
其输出结果为:
recall:
此时 recall 为 98.9% 并不满足业务 99.5% 的需求,因此需要继续增加 level 以获得更好的召回精度。
(3)评估不同 level
下的运行时间(optional)
除了精度之外,系统的性能指标也很重要,包括 QPS 和 Latency。这里可以采用最简单的打点方式记录不同精度参数下的查询时间:
search_params = {
"params": {
"level": 1,
# "enable_recall_calculation": True
}
}
start_time = time.perf_counter()
client.search(collection_name = "ZillizCloudVectorDBBench",
data = data,
limit = limit,
search_params = search_params
)
end_time = time.perf_counter()
elapsed_time = end_time - start_time
print(f"latency at level=1: {elapsed_time:.2f} seconds")
其输出结果为:
latency at level=1: 0.03 seconds
(3)增加 level
并评估召回率与运行时间
由于 level=1 的精度无法满足业务需求,我们逐步将 level 参数从 1 增加到最大值 10,并记录召回率和运行时间的变化。
recall at level=1: [0.9886875152587891]
latency at level=1: 0.03 seconds
...
recall at level=6: [0.9947500228881836]
latency at level=6: 0.04 seconds
recall at level=
7
: [
0.9961249828338623
]
latency at level=
7
:
0.04
seconds
...
recall at level=10: [1.0]
latency at level=10: 0.06 seconds
实验结果表明,当 level=7时,搜索精度达到 99.6%,满足了业务需求。随着 level 的继续增加,精度虽然有所提升,但延迟也显著增加,这会对线上系统的性能产生负面影响。因此,我们建议选择满足召回率要求的最小 level(即level=7),或在此基础上提升一档,以在精度和性能之间取得最佳平衡。
(5)持续观察性能指标(Metrics)或 QPS 变化(optional)
此外,云平台提供了一系列可观测的系统指标。在调整 level 参数的过程中以及新 level 参数生效后,可以实时监控系统性能指标(如 QPS 和延迟)的变化,全面评估参数调整对系统整体性能的影响,从而做出更全面的优化决策。
(6)使用 VDBBench 更准确地评估召回率、延迟和 QPS(optional)
VDBBench 是 Zilliz 提供的开源性能测试工具,用户可以通过它进行固定数据集的基准测试,验证不同配置下召回率、平均延迟 latency 和 QPS 的数据,确保测试结果的准确性和可靠性。以下是我们基于 VDBBench 测试 1000 个 search 得到的更全面、精确的结果:
level | Avg latency(ms) | QPS | Recall from VDBBench | Recall from Zilliz Cloud |
---|---|---|---|---|
1 | 3.1 | 1266 | 0.9519 | 0.953 |
2 | 3.2 | 1080 | 0.9644 | 0.9669 |
3 | 3.4 | 972 | 0.9728 | 0.9755 |
4 | 3.6 | 814 | 0.9816 | 0.9846 |
5 | 3.9 | 704 | 0.9871 | 0.99 |
6 | 4.4 | 549 | 0.9916 | 0.995 |
7 | 5 | 448 | 0.9936 | 0.9971 |
8 | 5.4 | 375 | 0.9945 | 0.9983 |
9 | 5.9 | 340 | 0.9952 | 0.9991 |
10 | 6.3 | 296 | 0.9958 | 0.9995 |
本次测试对比了专用测试集上的召回率(Recall)、延迟(Latency)和 QPS 指标。测试结果表明,基于 enable_recall_calculation
的预估结果较为精准,能够有效反映实际性能表现。需要注意的是,召回率的提升会伴随延迟增加和 QPS 下降。若当前召回率已满足需求但 QPS 较低时,用户还可选择 CU 扩容或增加 Replica 等方式提升系统的吞吐性能。
(7)确定最终参数,完成调优
根据上述步骤的测试结果,确定最佳的精度参数配置level=7
,完成系统调优,确保在召回率、QPS 和成本之间找到最优平衡。
通过这里的实践示例,我们展示了如何利用 Zilliz Cloud 的功能进行精准调优,从明确业务目标、导入测试数据,到逐步调整 level
参数并评估召回率和延迟,最终找到最优配置。该场景的实验结果表明,level=7
能够在召回率达到 99.6% 的同时,将延迟控制在 5ms 以内,实现了精度与性能的完美平衡。
04
小结
Zilliz Cloud 通过 level
参数和 enable_recall_calculation
功能,为开发者提供了强大的工具,帮助用户在召回率(Recall)和性能(Latency/QPS)之间找到最佳平衡点。无论是推荐系统对高吞吐量和低延迟的需求,还是人脸识别对高精度和低误检率的要求,Zilliz Cloud 都能通过灵活的精度调节和实时召回率预估,满足不同场景的多样化需求。
未来,随着 AI 和大模型技术的不断发展,向量检索的应用场景将更加广泛,无论是推荐系统、人脸识别,还是 RAG、多模态搜索等新兴领域,对精度和性能的需求也将更加多样化。Zilliz Cloud 将持续优化其功能,为开发者提供更高效、更灵活的解决方案,助力用户在复杂的业务场景中实现最佳性能表现。
推荐阅读