• snooggums@midwest.social
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    1
    ·
    edit-2
    4 months ago

    How would a site make itself acessible to the internet in general while also not allowing itself to be scraped using technology?

    robots.txt does rely on being respected, just like no tresspassing signs. The lack of enforcement is the problem, and keeping robots.txt to track the permissions would make it effective again.

    I am agreeing, just with a slightky different take.

    • Album@lemmy.ca
      link
      fedilink
      arrow-up
      1
      ·
      4 months ago

      User agent catching is rather effective. You can serve different responses based on UA.

      So generally people will use a robots.txt to catch the bots that play nice and then use useragents to manage abusers.