support@webharvy.com | sales@webharvy.com | YouTube Channel | KB Articles

Product Tour

Loading Web Pages & Starting Configuration

Selecting Data / Page Interaction

Following a link

Capturing data from multiple pages

Saving Configuration

Editing Configuration

Scraping Data

Export captured data

Category Scraping

Keyword based Scraping

Scrape via Proxy Server

Miner Settings

Scheduler & Command line options

How to register ?



Selecting Data to Scrape


1. Capture Text / URLs / Email / Images
2. Capture portion of text (sub text)
3. Capture Text following a Heading
4. Capture HTML
5. Capture hidden fields ('click to display' fields)
6. Apply Regular Expressions
7. Capture More Content
8. Capture Text as File

Interacting with Page


1. Input Text
2. Run Java Script on page
3. Select dropdown option
4. Open Popup and scrape data
5. Scroll page down to load contents

Scrape hidden fields ('Click to display' fields)


There are many web pages where you need to click an item in order to display the text behind it. For example, in the following yellow pages web page, the phone number will be displayed only when you click the 'Show number' button.

Scrape hidden fields

So before capturing data from the page (while in Config mode), you need to click and display phone numbers of all listings. The same process must be repeated later while mining data. For this, click on the first hidden field and in the resulting Capture window displayed, click 'More Options' button and select the 'Click' option as shown below.

Scrape hidden fields

Wait for a few seconds and you will see that all hidden fields are automatically clicked and displayed. Now you may click and extract the phone numbers as if they are normal text fields in the page.

Watch video : Capture hidden 'click to display' fields

Scrape using Regular Expressions


WebHarvy allows you to apply Regular Expressions on the selected text (or HTML) before scraping it. You may apply Regular Expressions on Text or HTML.

WebHarvy RegEx Tutorial

Regular expressions can be applied by clicking the 'More Options' button and then selecting the 'Apply Regular Expression' option as shown below.

Scrape using RegEx

You may then specify the RegEx string. WebHarvy will extract only those portion(s) of the main text which matches the group(s) specified in the RegEx string.

Scrape using RegEx

Click Apply. The resulting text after applying the Regular Expression will be displayed in the Capture window text box. Click the main 'Capture Text' button to capture it. The result after matching the RegEx string will be extracted as shown below.

Scrape using RegEx

Watch video : Selecting required portions of text using Regular Expressions (RegEx)

Scrape More Content


Apply the 'Capture More Content' option after clicking the 'More Options' button in Capture window to scrape more content than what is currently displayed in the Capture window preview area. When you apply this option WebHarvy will capture the parent element of the currently selected element. You may apply this option multiple times till the Capture window preview area displays the required content.

Scrape more content

This option comes in handy while capturing articles or blog posts. During Config, click on the first paragraph of the article (or blog) and when the Capture window is displayed, click the 'Capture More Content' option until the whole article text is displayed in the preview area. Then click the 'Capture Text' button to capture it.

Scrape Text as File


The 'Capture Text as File' option under 'More Options' in the Capture window will let you scrape the selected text (text displayed in Capture window preview area) as a file. While mining, the text will be downloaded as a file to the specified folder. Like the 'Scrape more content' feature, this feature is helpful while extracting articles or blog posts.

Scrape text as file

Input Text


The 'Input Text' option under 'More Options' in the Capture window allows you to enter text in input fields on web pages. During configuration, click on the input field/text box where you want to enter text and then select 'More Options' > 'Input Text', from the resulting Capture window. Type in the string which you need to input and click OK, the specified string will be placed inside the text box. The same action will be automatically repeated during the mining stage.

Input Text to field

Run Java Script on page


The 'Run Script' option under 'More Options' in the Capture window allows you to run Java Script code on the currently loaded page. For this, click anywhere on the page and select More Options > Run Script from the Capture window. In the resulting window you can enter the Java Script code which you need to run and click OK.

Run Java Script Code on Page, Scraping

Run Java Script Code on Page, Scraping

The code will be run at once for you to see the results, and will also be run automatically during the mining phase.

Select dropdown/listbox/combobox option


During configuration, by clicking on a list/dropdown box and by selecting 'More Options' > 'Select Dropdown Option', you can select any value from a list/dropdown box.

Select dropdown option, combo box, listbox, Scraping

As shown below, in the resulting window you can select the required list option and it will be selected automatically during mining.

Select dropdown option, combo box, listbox, Scraping

Open Popup and scrape data


In some web pages, you will have to click on each listing/link to open a popup or populate a view within the same page with the corresponding details. Data related to each listing should be extracted after clicking its title link/button. This is different compared to 'Following a link' where a new page is loaded which displays the required data. Here, a popup window / view within the same page is updated with results/data. In such cases the 'Open Popup' option under 'More Options' in Capture window can be used, as shown in the following example.

Open Popup and scrape data

Click the title/link of the first listing and select 'More Options' > 'Open Popup'. This will open the popup window or update an area in the same page with the required data. Now you can click and select the data displayed in normal fashion. Kindly note that Preview will be updated with details of first listing only. During mining, WebHarvy will click each listing link one-by-one and get resulting data.

Scroll page down slowly so that content is loaded


Sometimes a web page load contents further down the page (like images, lazy loading) only if the page is scrolled down. In such cases the 'Scroll Page' option under 'More Options' in Capture window can be used. Click anywhere on the page during configuration and select More Options > Scroll Page.

Open Popup and scrape data