Is Web Scraping Legal ?

Is it legal to scrape data from websites? This article discusses the legality and ethics of web scraping, which is the process of automatically extracting large amounts of data from websites using software tools, often without the permission of the website owners.

Is it legal to scrape data from websites using software ? The answer to this question is not a simple yes or no.

To make sure that your web scraping activity is 100% legal

The following conditions need to be met.

  1. 1. The data scraped is public
  2. If the data is displayed publicly by the target website, without any technical barriers like login, paywall etc., scraping is often allowed, especially if the data scraped is for personal use or research.

  3. 2. Website's terms of service do not prohibit web scraping
  4. Check the terms of service of the website. Ensure that web scraping, i.e. extracting data from the website using software tools, is not prohibited.

  5. 3. Make sure that you are not scraping copyright protected or personal data
  6. Sometimes, articles, images or datasets are protected by copyright laws. Scraping them and later using them as your own property, can result in copyright/legal violations

How you plan to use the data which you have extracted from a website is also important.

Because the data displayed by most websites is for public consumption, it is totally legal to copy this information to a file in your computer. But it is regarding how you plan to use this data that you should be careful about. If the data is downloaded for your personal use and analysis, then it is absolutely ethical. But in case you are planning to use it as your own, in your website, in a way which is completely against the interest of the original owner of the data, without attributing the original owner, then it is unethical, illegal.

To ethically scrape data from a website

Follow the points given below.

  1. 1. Respect the robots.txt file provided by the website. Make sure that you do not crawl URLs which have explicitly disallowed access to crawlers.
  2. 2. Respect the terms of service and copyright of the website
  3. 3. If the website displays CAPTCHA forms, it is a clear indication that web scraping is discouraged.
  4. 4. Limit your web scraping page crawl rate. Do not put over-load on the website's server by sending many parallel page-load requests. If your web scraping tool loads the website's pages sequentially (like a human), then your chances of getting blocked by the website will be limited.
  5. 5. While using the scraped data, provide proper attribution (cite sources), and do not use it to create products or services in direct competition with the target website.

References:

  1. How to anonymously scrape data from websites ?
  2. https://gizmodo.com/federal-appeals-court-rules-that-duh-scraping-public-d-1837999165
  3. https://www.theverge.com/2019/9/10/20859399/linkedin-hiq-data-scraping-cfaa-lawsuit-ninth-circuit-ruling
  4. https://arstechnica.com/tech-policy/2019/09/web-scraping-doesnt-violate-anti-hacking-law-appeals-court-rules/

Scrape Data Anonymously

WebHarvy is an easy-to-use visual web scraper which lets you scrape data anonymously from websites, thereby protecting your privacy. Proxy servers or VPNs can be easily used along with WebHarvy so that you are not connected directly to the web server during data extraction. Also, to minimize the load on web servers, and to avoid detection, there are options to automatically insert pauses & emulate a human user during the web scraping process.