How to scrape emails from any web page using WebHarvy?

WebHarvy can scrape email addresses and phone numbers by parsing the HTML source code of web pages. You can download and install the free trial version of WebHarvy in your computer from our website.

Steps to follow to scrape email addresses

  • Open WebHarvy and load the page from which you need to scrape email addresses
  • Start Configuration (or you might be already in configuration mode and navigated to the page which contains the email addresses to scrape, by following links from the starting page)
  • Click anywhere on the page to bring up the Capture window
  • Double click on the Capture HTML toolbar icon of Capture window. This will select and display the entire page HTML content in the preview area of Capture window.
Double click the Capture HTML toolbar icon
Capture window displays full page HTML
  • Select More Options > Apply Regular Expression (or click on the Apply RegEx toolbar icon)
  • In the resulting window, expand the drop down and select the regular expression to capture email address.
Apply RegEx to get email addresses
  • Select the ‘Match Multiple Times’ option if you wish to scrape multiple (all) email addresses from the page
  • Click the Apply button
  • If the page contains email addresses, they will be selected and displayed in the Preview area of Capture window.
  • Click on the main Capture HTML button to scrape the selected email addresses

To scrape phone numbers, use any of the following RegEx strings in Step 6 above.

(\d{3}[\d\-\s]+)

(\d{3}[\s\-]+\d{2}[\s\-]+\d{2}[\s\-]+\d{3})

Related Links

  1. WebHarvy getting started guide
  2. Capture HTML
  3. Apply Regular Expression
  4. Regular Expression Tutorial
  5. Scrape emails from any website or search query using GrabContacts