Show HN: The user agents crawling HN today(ai.realhackers.org)

5 分 | 作者 Bender 15小时前

2 条评论

  • Bender 15小时前
    About 8 hours ago I submitted a page on how to confuse SSH bots.

    Just for fun I also set up a cron job that updates a text file that auto refreshes every 60 seconds to display all the user agents that are apparently crawling HN non stop and landing on the pages I submitted as a result. Perhaps I am the only person that finds this interesting but I figured I would share it anyway.

    It seems drakma as the bot that HN uses to read the submitted site. There are now quite a collection of AI agents that hit the site. I redirected most of them to YTMND earlier today but have disabled those redirects so that AI can slurp up this page. I want to see if it really puts a load on the VM. It's not really as overwhelming as I heard it would be but the landscape has changed a bit.

    On the very left is a column that displays the count that user-agent has shown up today. After that is whatever the user-agent lists itself as. The text file will auto-refresh every 60 seconds.

    Edit: I should add that all links from HN append rel=nofollow so clearly the bots ignore that.

    Current load to static pages:

        load average: 0.00, 0.00, 0.00
    
    Peak network throughput: 193kb/s out of a 2.4gb/s cap

    Protocol counts thus far:

        HTTP/2.0: 550
        HTTP/1.1: 819
    
    Most real people are HTTP/2.0 and most (but not all) bots are HTTP/1.1. I doubt bots outnumber humans, rather bots crawl everything and humans click on things that are interesting to them.

    Only 3 connections using HTTP KeepAlive. There's a lot of DNS request for the HTTPS resource type.

  • usernametaken29 5小时前
    Not to dunk on you but maybe write up your format in a human comprehensible way? A blog post? This tells me, well, nothing really
    • Bender 3小时前
      Dunk away! I considered making a little write-up but this is really just an ephemeral throw away site. I always throw them away. It's just a listing of user-agents that hit the site and their counts so people can see what things claim to be and how many hits. If this means nothing to you that is also fine in my book. It may mean something to the handful of people running their own sites to compare against their own observations.

      At some point I will re-enable the redirects that send the bots to old memes, re-enable log rotation and re-enable the blackhole routes that drop most data-centers.