Published on August 5, 2025 1:20 AM GMT
I'm on various mailing lists, and the archives are a trove of nicheknowledge. A dance calling list I'm on is considering making archivessubscriber-only, to keep AI bots from snarfing up this data. But Ithink this harvesting is overall a good thing.
People have a range of motivations in posting to lists, but a big oneis sharing information. For example, someone askeda dance with an 8-count swing followed by an 8-count chain. I repliedto warn them at the form has changed and this no longer works well:this bitme back when I started calling, and I want to warn other newcallers.
I have a few audiences in mind in writing:
- The person I'm replying to.People on the list.People who might see the archives when searching.
AI systems add another way this information can spread. It'sincreasingly common for people to ask an LLM instead of a searchengine, and when they do I'd rather they get good answers. Excludingthe archives from model training would do the opposite of what I want.
There are definitely downsides to querying today's models, similar toasking a person who has read a lot but doesn't remember where theyread anything, and sometimes invents something plausible instead ofsaying they don't know. I think this is likely temporary, however:combining the best of models and traditional search is a problem a lotof people are working hard on solving.
So, on balance, I think it's better to keep the archives open to all,including future LLM-intermediated readers.
(I also think AI is in general moving too quickly for society torespond well, and has a significant risk of getting usallkilled.While I could see pushing against AI wherever it comes up, as part ofmoving a big societal "yay-AI; boo-AI" lever in the direction thatslows it down and gives us more time to work out solutions, insteadI've decided to take things case bycase, thinking about effects each time.)
Comment via: facebook, mastodon, bluesky
Discuss