everytime i check nginx logs its more scrapers then i can count and i could not find any good open source solutions

  • daniskarma@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    8 hours ago

    How do you know it’s “AI” scrappers?

    I’ve have my server up before AI was a thing.

    It’s totally normal to get thousands of bot hits and to get scraped.

    I use crowdsec to mitigate it. But you will always get bot hits.

    • Drunk & Root@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 hours ago

      bot hits i dont care my issue is when i see the same ip querying every file on 3 resource intensive sites millions of times

      • daniskarma@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        57 minutes ago

        Do you have a proper robots.txt file?

        Do they do weird things like invalid url, invalid post tries? Weird user agents?

        Millions of times by the same ip sound much more like vulnerability proving than crawler.

        If that’s the case fail to ban or crowdsec. Should be easy to set up a rule to ban an inhumane number of hits per second on certain resources.