Web Scraping is the process of automatically extracting data from websites using software known as web scrapers. In this article we will learn how web scraping can be used to extract YouTube video details.
Scraping YouTube
YouTube Video Data can be scraped from :
- Video recommendations (home) page of an account
- YouTube video search results page
- Videos page of a YouTube channel
The following data related to a video can be scraped:
- Video title
- Video URL
- Thumbnail
- Views
- Likes
- Channel which posted the video
- Number of subscribers which the channel has
- etc.
Software to use
WebHarvy is a visual web scraping software which can be used to scrape data from any website including YouTube. WebHarvy can also scrape data from various other Google services like Maps, Shopping, Trends, Jobs etc.
Scraping YouTube Search Results
The video displayed below shows how video data can be scraped from YouTube search results (video listings).
The codes used in the video can be found here.
Scraping any YouTube channel’s video data
The video shown below demonstrates how video details can be scraped from any YouTube channel. The code used for pagination can be found here.
Steps to follow to scrape YouTube Video Data
- The first step is to download and install WebHarvy in your computer.
- WebHarvy has a browser like user interface within which you can load and navigate web pages
- Open YouTube website within WebHarvy and navigate to the video listings page from which you need to scrape data
- Click the Start button under Configuration pane in the Home menu (Start Configuration)
- Now we can start selecting the data which we need to scrape from the page. To select any data item displayed on the page, click over it. WebHarvy will display a Capture window with various options.
- Before selecting data from videos listed on the page, we need to configure pagination
- YouTube loads more data (videos) on the same page as we scroll down. This is called infinite scroll or scroll to load pagination. To configure WebHarvy to automatically perform this action, go to the Configuration menu and click on the Set JavaScript button under Pagination pane. In the resulting window, paste the following JavaScript code. (Pagination using JavaScript)
els = document.getElementsByTagName('ytd-video-renderer');
els[els.length-1].scrollIntoView();
- Now click anywhere on the page and select More Options > Run Script from the resulting Capture window. Paste and apply the following code. (Running JavaScript code on page)
els = document.getElementsByTagName('ytd-video-renderer');
parent = els[0].parentElement;
for (var i = els.length - 1; i >= 0; i--) {
if(els[i].parentElement !== parent) {
parent.appendChild(els[i]);
}
}
- Now we can start selecting data.
- Click over the title of the first video. Select Capture Text option from the resulting Capture window to select it for scraping.
- To select the video URL use the Capture target URL option
- To follow the video link to select more video details use the Follow this link option after clicking on the video link/title. (Following links)
- Once the video details page is loaded, click anywhere on the page and select More Options > Page > Scroll down option. This is performed so that all data on page is loaded before we start selecting it for extraction. (Scrolling down page)
- Video details like views, likes, channel name, number of subscribers etc. can be selected from this page.
- Once you have finished selecting all required data, click the Stop button under Configuration pane in Home menu.
- You may now optionally save the configuration
- Click the Start Mine button to start mining data using the configuration which we just created.
- The miner window’s data table will start to display data extracted from the page as mining progresses
- Once mining is finished you can save the mined data to a file or to a database
Try WebHarvy
You may download and try a free 15 day evaluation version of WebHarvy. To get started, please follow this link.
Have Questions?
If you have any questions, please do not hesitate to contact our technical support team.