Pro AI Bots Scraping List Archives

少点错误 7小时前

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

文章作者认为，尽管一些邮件列表考虑限制订阅者访问以阻止AI抓取，但这种信息抓取行为总体上有益。作者以自己参与的舞蹈列表为例，说明了分享信息和警示新人的重要性。AI系统作为信息传播的新途径，能够让更多人从邮件列表的知识中受益。尽管当前AI模型存在一些局限，但作者相信这些问题会得到解决。因此，他主张将邮件列表存档保持公开，以便AI模型能从中学习，提供更准确的信息，而不是阻碍其发展。

✅ AI抓取邮件列表信息有助于知识的传播和普及。作者认为，人们发布信息到列表的动机之一是分享知识，AI抓取这些信息能让更多人受益，就像人们通过搜索引擎获取信息一样，他希望AI能提供准确的答案。

💡 保持邮件列表公开有助于AI模型学习和改进。作者指出，将列表存档排除在模型训练之外，反而会阻碍AI提供更优质信息的初衷。尽管当前AI存在“一本正经胡说八道”的问题，但他相信这是暂时的，技术会不断进步。

⚖️ 决定是否开放应权衡利弊，作者倾向于公开。尽管AI发展过快可能带来风险，但作者选择逐案分析，并认为在邮件列表信息公开问题上，AI抓取带来的好处大于潜在的坏处，应保持开放状态。

🤝 作者的出发点是促进信息共享和社区知识积累。他以自己作为舞蹈信息召集人（caller）的经历为例，说明了在列表上分享经验、警示新人的重要性，AI的介入可以放大这种积极效应。

Published on August 5, 2025 1:20 AM GMT

I'm on various mailing lists, and the archives are a trove of nicheknowledge. A dance calling list I'm on is considering making archivessubscriber-only, to keep AI bots from snarfing up this data. But Ithink this harvesting is overall a good thing.

People have a range of motivations in posting to lists, but a big oneis sharing information. For example, someone askeda dance with an 8-count swing followed by an 8-count chain. I repliedto warn them at the form has changed and this no longer works well:this bitme back when I started calling, and I want to warn other newcallers.

I have a few audiences in mind in writing:

The person I'm replying to.People on the list.People who might see the archives when searching.And then there's a general sense in which I'm contributing to whatpeople know about contra dance: any of these people might tell othersor otherwise pass it along.

AI systems add another way this information can spread. It'sincreasingly common for people to ask an LLM instead of a searchengine, and when they do I'd rather they get good answers. Excludingthe archives from model training would do the opposite of what I want.

There are definitely downsides to querying today's models, similar toasking a person who has read a lot but doesn't remember where theyread anything, and sometimes invents something plausible instead ofsaying they don't know. I think this is likely temporary, however:combining the best of models and traditional search is a problem a lotof people are working hard on solving.

So, on balance, I think it's better to keep the archives open to all,including future LLM-intermediated readers.

(I also think AI is in general moving too quickly for society torespond well, and has a significant risk of getting us all killed.While I could see pushing against AI wherever it comes up, as part ofmoving a big societal "yay-AI; boo-AI" lever in the direction thatslows it down and gives us more time to work out solutions, insteadI've decided to take things case bycase, thinking about effects each time.)

Comment via: facebook, mastodon, bluesky

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签