Anonymously scrape data from websites
Web Scraping is the technique of automatically extracting data from websites using software/script. Our software, WebHarvy, can be used to easily extract data from any website without any coding/scripting knowledge.
One of the main problems which you might face while trying to extract data from websites using automated techniques is the web server blocking your computer’s IP, thereby denying you access to load its pages. Many websites have mechanisms in place to detect automated data scraping using software and block the IP of computers from where they are run. Also, while scraping data, you may not want to reveal your identity (network details) to remote web servers.
The best solution to avoid blocking and to protect your privacy is to use proxy servers or VPN while scraping data. These help you to remain anonymous while scraping data as well as to avoid getting blocked. Both these can be easily setup along with WebHarvy.
You may use the 'Inject pauses during mining' feature to avoid making continuous page requests to web servers for long duration. Although this method will minimize the chances of getting detected and blocked by web servers, this may not be effective always and your identify is still not hidden from the web server.
The 'Scrape via Proxy Server' feature allows you to access and scrape websites through proxy servers, thereby maintaining anonymity while scraping data.
To configure this feature, click the 'Settings' option from the Edit menu and select the 'Proxy Settings' tab. You may provide a single proxy address or a list of proxy addresses as shown below. Know More
Either a single proxy server or a list of proxy servers can be used for web scraping. In case you select the 'Rotate proxies' option, WebHarvy will automatically rotate and use each proxy server in the list periodically. Otherwise, the first proxy in the list will be used.
How to obtain proxy server addresses ?
There are free as well as paid proxy servers available in the internet. You may find them by performing a google search. The free proxies available are often slow and unreliable, and may result in early termination of mining process. For this reason we do not recommend using free proxies with WebHarvy.
However, in case you want to try free proxies with WebHarvy we recommend the following list provided by HMA :
We have tested Trusted Proxies with WebHarvy and recommend their Proxy Server Cloud service. You can signup for a free trial account with Trusted Proxies at the following link which will let you try their service for FREE with WebHarvy
Once you create a free Proxy Server Cloud trial account you will receive your proxy IP address / port number as well your account username/password. You should add the proxy server IP address/port in WebHarvy as explained at Scrape via Proxy Servers. Then, before mining you should authenticate with Trusted proxies at the web link provided in account settings details using your user name & password.
Please contact our support in case you need assistance or have any questions.