• interdimensionalmeme@lemmy.ml
    link
    fedilink
    arrow-up
    1
    ·
    7 hours ago

    This would not be a problem if one bot scraped once, and the result was then mirrored to all on Big Tech’s dime (cloudflare, tailscale) but since they are all competing now, they think their edge is going to be their own more better scraper setup and they won’t share.

    Maybe there should just be a web to torrent bridge sovtge data is pushed out once by the server and tge swarm does the heavy lifting as a cache.

    • deadcade@lemmy.deadca.de
      link
      fedilink
      arrow-up
      2
      ·
      6 hours ago

      No, it’d still be a problem; every diff between commits is expensive to render to web, even if “only one company” is scraping it, “only one time”. Many of these applications are designed for humans, not scrapers.

      • interdimensionalmeme@lemmy.ml
        link
        fedilink
        arrow-up
        1
        arrow-down
        1
        ·
        3 hours ago

        If the rendering data for scraper was really the problem Then the solution is simple, just have downloadable dumps of the publicly available information That would be extremely efficient and cost fractions of pennies in monthly bandwidth Plus the data would be far more usable for whatever they are using it for.

        The problem is trying to have freely available data, but for the host to maintain the ability to leverage this data later.

        I don’t think we can have both of these.