Should I block the Yandex Bot?

  • I have a web application that the Yandex spider is trying access into back-end a few times. After these spider searching, there are few Russian IP addresses that try to access back-end too and they failed to access.

    Should I block Yandex or take another action?

    Update:

    The Yandex spider visits a back-end URL about once per 2-3 day. We did not release any back-end URL at the front-end.

    The "back-end" meanings: the web application's interface just allowing our administrative to manage the application

    You should look up the IP addresses to see if they are real Yandex IP addresses, or not. For instance, looking at my own access logs, the most common IP address by far identifying itself as Yandex is 100.43.81.141, which turns out to be legitimate. By contrast, 104.238.95.146 is not

    The IP addresses has been regonized as Yandex spider.

    What's the point? Scans won't stop when you block Russian search engines. It's only a matter of time before Chinese, Nigerian and Moroccan hackers pick you up.

    What is the "back-end" that you are talking about? Normally your "back-end" (middleware, databases and such) should not be even reachable from the internet.

    I would have thought 'back end' in this case could mean, for example, a REST API for a mobile application.

    Do you have a proper robots.txt to inform the various spiders on which parts of your backend should or should not be accesed? In the absence of that, expect any linked URLs to your backend to be crawled by various well-meaning bots.

    actually, just let it try, those spiders cannot get anything from me. I have access control on my back-end. Back-end of web application in Hong Kong programmer means that just the administrative of the web application would be access.

    No, Yandex feeds DuckDuckGo.

  • deviantfan

    deviantfan Correct answer

    5 years ago

    Should i block Yandex

    Why?
    First, if the bot is a legitimate search engine bot (and nothing else), they won't hack you. If not, blocking a User agent won't help, they'll just use another one.
    If your password is good, fail2ban is configured, the software is up to date etc., just let them try. If not, you need to fix that, independent of any Yandex bots.

    To make sure the problem is actually Yandex, try disallowing it in robots.txt and see if it stops.
    No => not Yandex.

    (Did set up a new webserver some weeks ago. One hour after going online, had not even a domain yet, a "Googlebot" started trying SQL injections for a non-existent Wordpress. It was fun to watch, as there were no other HTTP requests. But I did not block Google because of that.)

    Reminds me of when one of those Windows worms went around. Our router at the time was a Linux box and my brother setup a script to make the PC speaker beep whenever our IP was scanned (and also to log the information of the infected machine). My dad said for a while it sounded like someone was playing a video game, due to the high frequency of attacks pinging. There's also a variety of maps like http://map.norsecorp.com/#/

    Nevertheless it is disturbing that the crawler or its affiliates effectively harvested the backend middleware URL and was crawling it directly. I'm seeing the same pattern myself and it is unsettling to see a crawler target unpublished URLs.

    @BradHein For commonly used internet software, these URLs are known, it's no reason to believe someone hacked you

    @deviantfan That is true of commonly used internet software. Did OP state that his software is common? I see Yandex commonly target custom-designed software by its unpublished URLs however, so it is using unscrupulous means to gather these otherwise hidden URLs.

License under CC-BY-SA with attribution


Content dated before 7/24/2021 11:53 AM