WebHarvy can be used to scrape Chrome Web Store data. Chrome web store displays Chrome extensions, listed under various categories. In this article we will see how WebHarvy can be used to scrape details of extensions listed under a specific category from Chrome Web Store.
Using WebHarvy to Scrape Chrome Web Store
Chrome web store uses infinite scroll for pagination. Extensions are loaded in the same page as we scroll down. The newly loaded extensions are loaded under a different HTML element. For this reason, we will need to run a JavaScript code to bring all extensions under a single HTML element, so that all of them will be selected during mining.
The following video shows the steps involved in detail. You can find the various codes used in the video description.
As shown in the above video, web scraping extensions details from Chrome Web Store is performed in 2 stages :
- In stage 1, we get the URLs of all extension details pages
- In stage 2, we scrape data from all these URLs using a single configuration.
The JavaScript Code used to collate all rows of data under a single element is given below.
var groups = document.querySelectorAll('[role="grid"]');
var parent = groups[0];
for (var i = groups.length - 1; i >= 1; i--) {
var group = groups[i];
for (var j = group.children.length - 1; j >= 0; j--) {
parent.appendChild(group.children[j]);
}
}
The regular expression string used to get extension details page URL is given below.
href="([^"]*)
Try WebHarvy
We highly recommend that you download and try using the free evaluation version of WebHarvy available in our website. To get started, please follow the link given below.