Web Scraping football data helps in creating a comprehensive dataset containing statistics of teams, players and matches, which can be used for analysis or to build your own dashboards displaying various stats and tables.
Web Scraping is the process of automatically extracting data displayed by websites to a spreadsheet file in your computer or to a table in a database. The software which performs web scraping are called web scrapers. Web Scraping is used for various applications in marketing, academic research, real estate, eCommerce, machine learning, sports analysis etc.
In this article we will see how a web scraper can be used to build a football dataset by scraping team, player and match data from various football stat websites.
Web Scraping Football Data
Web Scraping is used to collect data and the first step of data collection is to identify the sources of data. You will need to identify the websites which display the data which you need.
Once the sources are identified, you can use a web scraper to extract the required data. You can build your own web scraper (if you are a developer or if you have the budget to hire a developer) or else you can use a web scraping software like WebHarvy which can be used to easily scrape data from any website, including football statistics websites.
Using WebHarvy to Scrape Football Data
You may download and install the free evaluation version of WebHarvy in your computer. When you open WebHarvy, you will see a browser like interface in which you can load and navigate web pages.
WebHarvy is a visual web scraper – which means you can click and select the data which you need to scrape from any website. WebHarvy can be used to scrape football stats and tables from the following websites (not limited to)
- FootyStats.org
- WhoScored.com
- SoccerStats.com
- FBref.com
- etc.
WebHarvy can also scrape sports betting odds from websites like..
- Oddsportal.com
- FlashScore.com
- BetExplorer.com
- etc.
Steps to follow to scrape football stats and tables
As a simple first example, let us try to scrape the standings table of Premier League matches displayed at https://www.premierleague.com/tables.
Scraping Football League Standings Table
- The first step is to download and install WebHarvy in your computer.
- Open WebHarvy and load the page (https://www.premierleague.com/tables) within the configuration browser.
- Start Configuration by clicking the Start button in the Home menu.
- Now you can click and select any data item displayed on the page for extraction.
- Clicking any item on the page will bring up a Capture window with various options.
- To select the text of the clicked item, select the Capture Text option.
- Once you have selected all required data from the page, click the Stop button to Stop Configuration
- Click the Start Mine to start mining data using the configuration created.
- Once mining is finished, you can save the mined data to a spreadsheet file or database by clicking the Export button.
Scraping football match results and stats
Let us now see how match results and stats of all matches of any league/season can be scraped using WebHarvy. Match details like date, location and score and stats like possession, shots on target, fouls, corners, tackles, passes etc. can be scraped.
In this example we will configure WebHarvy to scrape match details from 2021-22 Premiere League season.
Steps to follow
- Open WebHarvy and load the page which displays the 2021-22 match listings
- Start Configuration
- Select the names of home and away teams, final score and location from the starting page using the Capture Text option
- Since the page loads more matches as we scroll down, select the ‘Scroll to load next page’ option as explained at https://www.webharvy.com/tour3.html#ScrollToLoad
- To open each match details page and scrape more details, click on any empty space in the first match row, select Capture More Content option once and then click the Follow this link button
- This will load the match details page. Wait for the match details page to load
- Once loaded, click on the Stats tab and select More Options > Click from resulting Capture window
- This will load the stats page from where you can click and select match stats like possession, shots on target, shots, tackles, passes, accuracy etc.
- Once you have selected all required data, click the Stop Configuration button.
- You may now optionally save the configuration
- Click the Start Mine button to mine data using the configuration
- Once mining is finished, the mined data can be saved to a file or database when mining completes
Scraping Football data from FootyStats.org
The following video shows how WebHarvy can be used to scrape match details from footystats.org website.
Scraping Football Match Logs from FBref.com
Video displayed below shows how WebHarvy can scrape match logs from FBref.com website.
SoccerSTATS.com scraping
WebHarvy can scrape match stats from SoccerSTATS.com website.
Try WebHarvy
We recommend that you download and try using the free evaluation version of WebHarvy available in our website. To get started, please follow this link.
Have Questions?
If you have any questions, please do not hesitate to reach out to our customer support team.