How to use User Agent strings to prevent blocking while web scraping ?

What is a user agent string ?

The User-Agent string of a web browser helps servers (websites) to identify the browser (Chrome, Edge, FireFox, IE etc.), its version and also the operating system (Windows, Mac, Android, iOS etc.) on which it is running. This mainly helps the websites to serve different pages for various platforms and browser types.

If you go to https://www.whatismybrowser.com/detect/what-is-my-user-agent you can see the user agent string of your browser.

User Agent strings for web scraping

The same detail can be used by websites to block non-standard web browsers and bots. To prevent this we can configure web scrapers to mimic a standard browser’s user agent.

WebHarvy, our generic visual web scraper, allows you to set any user agent string for its mining browser, so that websites assume the web scraper to be a normal browser and will not block access. To configure this, open WebHarvy Settings and go to the Browser settings tab. Here, you should enable the custom user agent string option and paste the user agent string of a standard browser like Chrome or Edge.

This option can be used to make WebHarvy’s browser appear like any specific standard web browser (ex: Microsoft EdgeMozilla FirefoxGoogle Chrome or Apple Safari) to websites from which you are trying to extract data.

How to get user agent string of various browsers ?

You may find user agent strings of various browsers at http://useragentstring.com/pages/useragentstring.php

Leave a Reply

Your email address will not be published. Required fields are marked *