The Open-Source Software Saving the Internet From AI Bot Scrapers

fattyfoods@feddit.nl · 23 hours ago

The Open-Source Software Saving the Internet From AI Bot Scrapers

interdimensionalmeme@lemmy.ml · 7 hours ago

This would not be a problem if one bot scraped once, and the result was then mirrored to all on Big Tech’s dime (cloudflare, tailscale) but since they are all competing now, they think their edge is going to be their own more better scraper setup and they won’t share.

Maybe there should just be a web to torrent bridge sovtge data is pushed out once by the server and tge swarm does the heavy lifting as a cache.

deadcade@lemmy.deadca.de · 6 hours ago

No, it’d still be a problem; every diff between commits is expensive to render to web, even if “only one company” is scraping it, “only one time”. Many of these applications are designed for humans, not scrapers.

interdimensionalmeme@lemmy.ml · 3 hours ago

If the rendering data for scraper was really the problem Then the solution is simple, just have downloadable dumps of the publicly available information That would be extremely efficient and cost fractions of pennies in monthly bandwidth Plus the data would be far more usable for whatever they are using it for.

The problem is trying to have freely available data, but for the host to maintain the ability to leverage this data later.

I don’t think we can have both of these.