If you have ever tried to use sports data for your analysis, you may have encountered how difficult it is to find data. If it is free that is because it might be quite old and if is up to date it will probably come at a cost not everyone can afford. But there is always a way to find up to date data for free and that is exactly what today’s blog is about. Once data is available, Joris will share with all of us a great example of how that could be visualized into a tool like Tableau.
What is API-Football?
API-Football is a Restful API that covers +915 leagues & cups with livescore, standings, events, line-ups, players, pre-matches odds, statistics and many more. Even though it has an easy integration and works with code libraries in popular languaes, we will cover today how it does work with Alteryx.
For those of you have been worked with RapidAPI before, it also works with it and it will allow you to manage from one dashboard, view endpoints, test from the browser, connect using code snippets, code libraries in popular languages, call volume and billing, check errors and latency, take a look at logs for your API calls and secure payment.
API-Football works with 4 different plans:
- Free for 0$ Month at 100 requests/day
- Pro for 19$ Month at 7500 requests/day
- Ultra for 29$ Month at 75000 requests/day
- Mega for 39$ Month for 150000 requests/day
However, we will be usign a free plan for this specific blog’s content.
*API-Football might require you to introduce your CC details while signing up. However, while using a free subscription in the specified requests of 100/day no charge is to be made. It is specified in the terms and conditions that can be found here.
Step 1: Create an API-Football account
First of all, we need to set up our own API-Football account. To do so, we need to access the API-Football pricing section and choose SUBSCRIBE to the Free plan.
This will redirect us to the RapidAPI sign up section where we will need to specify a username, an email and a password in order to create our own account. However, it is also possible to sign up and create our account from the API Sports website as you can see in the following image. Either way works and will give us the
Right after your account has been created, your X-RapidAPI-Key or your API-Key (this will depend on which subscribing method you choose) will be created and that will be used to call the API to ger our data.
Step 2: Check API-Football documentation
After we have just created our account and log into the API services to get our key, it is time to check which kind of calls we can make and how to make those. That information will be stored in the documentation web-page of the API-Football service which can be accessed here.
It will contains information about which information is required and where to get it as well as the architecture and sample scripts for different types of code.
Furtheremore, there will every piece of possible endpoint of data that can be collected from an API call. Some of them may require the use of some parameters.
- Odds (In-Play)
- Odds (Pre-Match)
In order to get the information required for the parameters of any specific API query, it will be really important to check the IDs documentation of the API-Football service. It is possible to get to the IDs from the API Sports dashboard where it will be possible to get the League’s & Team’s IDs. For example, 39 is league id for the Premier League while 33 would be the id for Manchester United. However, after being logged in it could be access from this two URLs:
Building an Alteryx Workflow to get football data
Once our API account has been set up and we have collected our Key, it is time to move to Alteryx to build a workflow that allow us to make queries to the API.
Step 1: Assigning credentials
Our first input into Alteryx must contain the URL of the query and our Key. It would be a Text input tool the one we can use in order to set one row and two columns (one that contains a URL query such as «https://v3.football.api-sports.io/standings?league=39&season2021» and another one for the key).
Step 2: Downloading data from the specified URL
After these credentials have been brought into Alteryx, then it is required to make the API call. There is one specific tool to make it which is the Download Tool and it will require us to specify which is the URL of the query. It will also require us to select which field contains the key as that will be selected in the Headers tab of the Download Tool configuration.
Steps 1 & 2 would then look like this:
Step 3: Performing data cleansing on the data
This, if successful with a code such as HTTP/1.1 200 OK, will download data in a JSON format and that will require the use of the JSON Parse tool set to parse a JSON Field called DownloadData.
Once it has been parsed, data is structured in rows containing data which would need to cross tab into columns in order to get data ready to be analyzed later in any other tool. Before we do so, it is important to understand that there is a need to perform data cleansing that will involve a Sample Tool, a Tile Tool, a Filter Tool and a RegEx Tool:
- Sample Tool: First 13 rows (always!) contain data related to the league or competition as well as the season even though that is information that we are already aware of as we needed to include in the URL used to make the query. That is why there is a need to use a Sample Tool to skip the first 13 rows.
- Tile Tool: As our query contains data related to the standings for a specific league and season, then it will contain as many records as information it returns for every team of that specific league and season. For example, if we are downloading the data of the Premier League 2021/2022, we must consider that we will have 20 teams so we should use a Tile Method of Equal Records for a total of 20 tiles (which it really means teams for this use case). However, we must consider the number of teams in Tile Tool based on which competition we are querying its data.
- Filter Tool (optional): If there are any records coming out of the Tile Tool that we believe are not interesting to our main purpose, a Filter Tool can be used to filter out based on the Tile_SequenceNum because that way it is possible to remove it from every team in that specific league and season.
- RegEx Tool: Using RegEx here can be useful in order to rename those values that will serve as column names (or headers) for our final dataset. Once our download data is parsed using a JSON Parse Tool, the name for each will value will be under a colunm called JSON Name. That will contain some characters such as response.0.league.standings.0.0.team.id but we are interested only in the two final words of that name. That way we can use a RegEx to parse just that part of the name:
Finally, we could then move to use a Cross Tab Tool to pivot the table so based on the identification of each team (Tile_Num coming from the Tile Tool).
After all these steps, data cleansing would be completed.
Step 4: Sorting and renaming columns to build our final table
It is now almost over!
In order to just have data that we can finally output to any kind of format (XLSX, CSV, HYPER, Google Sheets etc.), a Select Tool can be used to sort and rename the data so we can end up with a table that it looks like this. It is highly important to take into consideration the Google Sheets output as this one will allow users to update automatically their data later on Tableau Public by following Joris’ blog! However, it is really important to understand that this Tool is not automatically installed once Alteryx Designer has been installed but you will need to install it from the Alteryx Community once download from here and install the YXI right after. You can follow this instructions from the Alteryx Help page to make sure you can authenticate and write something from Alteryx in a Google Sheets.
After a query just like this one, we would be able to get information related to the competition, a team name, id & logo, results due to its ranking (UCL,UEL, Relegation), status compared to previous week, games played (home & away), games won (home & away), games lost (home & away), games drawn (home & away), goals scored (home & away), goals against (home & away) and the latest date for a game.
Once it is ready, we can just bring in an Output Data Tool to get our data and start our analysis by following Joris’ steps in Tableau.
PS. It may require a few more steps through the data cleansing for different types of queries but the data obtained always comes with the same structure.