Visualizing Premier League fixture difficulty


R ggplot2 plotly shiny


This post is part of a series:

Fantasy Premier League

  1. Building a dataset for Fantasy Premier League analysis
  2. Visualizing Premier League fixture difficulty


When playing Fantasy Premier League, people are very preoccupied with what opposition the players on their team will be facing in the short term. A commonly held belief is that players tend to get more fantasy points against weaker opposition. I haven’t done any actual analysis to support this belief, but it sounds perfectly reasonable, and I think it’s wise to keep upcoming fixtures in mind when planning your transfers.

With that in mind, wouldn’t it be neat if it was super easy to quickly get an overview over every single team’s schedule? If you don’t subscribe to any premium FPL analysis services, you’re probably evaluating matchups by clicking through the FPL user interface. The purpose of this post is to demonstrate that there’s a better way, you’ll also find a neat data visualization that you can use for the remainder of the 2019-20 Premier League season.

Getting data on upcoming fixtures

Let’s get some data from the Premier League API. If you’re not familiar with this data source, check out the first post in this series (there is a link at the top of this page).

library(tidyverse)
library(jsonlite)

# Download the `bootstrap-static` dataset into your R session
url <- "https://fantasy.premierleague.com/api/"
bootstrap_static <- fromJSON(paste0(url, "bootstrap-static"))

The bootstrap_static dataset from the FPL API contains all the information we need for superior fixture planning that will shock and baffle your competition.

We’ll start by creating an overview of teams that are participating in the current Premier League season.

teams <- bootstrap_static$teams %>% 
  as_tibble() %>% 
  select(name, id, code, starts_with("strength"))

teams
## # A tibble: 20 x 10
##    name         id  code strength strength_overall_… strength_overall_… strength_attack_h… strength_attack_a… strength_defence… strength_defence…
##    <chr>     <int> <int>    <int>              <int>              <int>              <int>              <int>             <int>             <int>
##  1 Arsenal       1     3        4               1250               1330               1240               1270              1290              1330
##  2 Aston Vi…     2     7        2                990               1050               1050               1050              1030              1070
##  3 Bournemo…     3    91        3               1050               1140               1040               1100              1120              1170
##  4 Brighton      4    36        2               1000               1070               1040               1140              1050              1070
##  5 Burnley       5    90        3               1070               1090                990               1030              1060              1070
##  6 Chelsea       6     8        4               1280               1310               1270               1340              1280              1330
##  7 Crystal …     7    31        3               1130               1080               1030               1180              1100              1070
##  8 Everton       8    11        3               1100               1220               1070               1120              1120              1210
##  9 Leicester     9    13        3               1110               1120               1130               1180              1110              1110
## 10 Liverpool    10    14        5               1340               1350               1330               1360              1330              1360
## 11 Man City     11    43        5               1340               1360               1320               1330              1340              1370
## 12 Man Utd      12     1        4               1310               1330               1250               1260              1310              1340
## 13 Newcastle    13     4        3               1080               1110               1070               1110              1070              1090
## 14 Norwich      14    45        2                990               1060               1050               1050              1060              1080
## 15 Sheffiel…    15    49        2                990               1060               1060               1080              1030              1050
## 16 Southamp…    16    20        3               1060               1090               1040               1070              1100              1110
## 17 Spurs        17     6        4               1320               1310               1270               1340              1320              1330
## 18 Watford      18    57        3               1110               1110               1100               1100              1110              1120
## 19 West Ham     19    21        3               1100               1160               1090               1100              1080              1200
## 20 Wolves       20    39        3               1110               1200               1180               1200              1080              1150

Lots of juicy information here. Not only does this dataset allow us to evaluate players based on the opponents overall strength; we can also compare offensive versus defensive strength, and home versus away strength. We don’t even have to calculate anything, all that needs to be done is to present pre-existing information in a user friendly manner.

The next piece of the puzzle is to get an overview of the Premier League schedule, this data is also contained within the FPL API, but not within the bootstrap_static sub-domain. To get a hold of this information, we need to access the extended player data available in the element-summary branch of the API. We only need one of these summaries from each team, so the next step will be to pick a player from each team, and download his element_summary dataset.

# Get the `id` numer of one player from each team
team_fixtures <- bootstrap_static$elements %>% 
  as_tibble() %>% 
  group_by(team_code) %>% 
  summarize(player_id = first(id)) %>% 
  ungroup() %>% 
  # Add team names
  left_join(
    teams %>% rename(team = name, team_code = code), by = "team_code"
  ) %>% 
  # Download an `element-summary` for each `player_id`
  mutate(
    element_summary = map(player_id, function(x) {
      paste0(
        "https://fantasy.premierleague.com/api/element-summary/", x, "/"
      ) %>% 
        fromJSON(.)
    }),
    # Extract the fixture overview from each `element_summary`
    fixtures = map(element_summary, function(x) x$fixtures)
  ) %>% 
  # Drop columns that we don't need anymore
  select(-team_code, -id, -player_id, -element_summary)

team_fixtures
## # A tibble: 20 x 9
##    team     strength strength_overall_… strength_overall_… strength_attack_h… strength_attack_a… strength_defence… strength_defence… fixtures    
##    <chr>       <int>              <int>              <int>              <int>              <int>             <int>             <int> <list>      
##  1 Man Utd         4               1310               1330               1250               1260              1310              1340 <data.frame…
##  2 Arsenal         4               1250               1330               1240               1270              1290              1330 <data.frame…
##  3 Newcast…        3               1080               1110               1070               1110              1070              1090 <data.frame…
##  4 Spurs           4               1320               1310               1270               1340              1320              1330 <data.frame…
##  5 Aston V…        2                990               1050               1050               1050              1030              1070 <data.frame…
##  6 Chelsea         4               1280               1310               1270               1340              1280              1330 <data.frame…
##  7 Everton         3               1100               1220               1070               1120              1120              1210 <data.frame…
##  8 Leicest…        3               1110               1120               1130               1180              1110              1110 <data.frame…
##  9 Liverpo…        5               1340               1350               1330               1360              1330              1360 <data.frame…
## 10 Southam…        3               1060               1090               1040               1070              1100              1110 <data.frame…
## 11 West Ham        3               1100               1160               1090               1100              1080              1200 <data.frame…
## 12 Crystal…        3               1130               1080               1030               1180              1100              1070 <data.frame…
## 13 Brighton        2               1000               1070               1040               1140              1050              1070 <data.frame…
## 14 Wolves          3               1110               1200               1180               1200              1080              1150 <data.frame…
## 15 Man City        5               1340               1360               1320               1330              1340              1370 <data.frame…
## 16 Norwich         2                990               1060               1050               1050              1060              1080 <data.frame…
## 17 Sheffie…        2                990               1060               1060               1080              1030              1050 <data.frame…
## 18 Watford         3               1110               1110               1100               1100              1110              1120 <data.frame…
## 19 Burnley         3               1070               1090                990               1030              1060              1070 <data.frame…
## 20 Bournem…        3               1050               1140               1040               1100              1120              1170 <data.frame…

At the moment, each team’s fixtures are stored within nested datasets under the fixtures column. Note that fixtures does not include completed games, only games that have yet to be played.

Our next step will be to clean up the nested fixtures datasets under team_fixtures:

# Display one of our 20 `fixtures` datasets
team_fixtures$fixtures[[1]] %>% as_tibble()
## # A tibble: 37 x 13
##       code team_h team_h_score team_a team_a_score event finished minutes provisional_start_ti… kickoff_time        event_name is_home difficulty
##      <int>  <int> <lgl>         <int> <lgl>        <int> <lgl>      <int> <lgl>                 <chr>               <chr>      <lgl>        <int>
##  1 1059721     20 NA               12 NA               2 FALSE          0 FALSE                 2019-08-19T19:00:0… Gameweek 2 FALSE            3
##  2 1059726     12 NA                7 NA               3 FALSE          0 FALSE                 2019-08-24T14:00:0… Gameweek 3 TRUE             3
##  3 1059740     16 NA               12 NA               4 FALSE          0 FALSE                 2019-08-31T11:30:0… Gameweek 4 FALSE            2
##  4 1059746     12 NA                9 NA               5 FALSE          0 FALSE                 2019-09-14T14:00:0… Gameweek 5 TRUE             3
##  5 1059761     19 NA               12 NA               6 FALSE          0 FALSE                 2019-09-22T13:00:0… Gameweek 6 FALSE            3
##  6 1059768     12 NA                1 NA               7 FALSE          0 FALSE                 2019-09-30T19:00:0… Gameweek 7 TRUE             4
##  7 1059777     13 NA               12 NA               8 FALSE          0 FALSE                 2019-10-06T15:30:0… Gameweek 8 FALSE            3
##  8 1059788     12 NA               10 NA               9 FALSE          0 FALSE                 2019-10-19T16:30:0… Gameweek 9 TRUE             4
##  9 1059798     14 NA               12 NA              10 FALSE          0 FALSE                 2019-10-27T16:30:0… Gameweek … FALSE            2
## 10 1059802      3 NA               12 NA              11 FALSE          0 FALSE                 2019-11-02T15:00:0… Gameweek … FALSE            3
## # … with 27 more rows

The nested datasets contain a lot of information that we’re not interested in right now, and the data’s format is not optimal for creating nice visualizations. A little bit of data wrangling is required before we proceed:

plot_data <- team_fixtures %>% 
  as_tibble() %>% 
  mutate(
    # Transform the nested `fixtures` datasets
    fixtures = map(fixtures, function(x) {
      x %>%
        # Get opponent's id number
        mutate(opponent_team_id = ifelse(is_home, team_a, team_h)) %>% 
        # Only keep the variables we need
        select(
          gameweek = event, opponent_team_id, is_home, difficulty
        ) %>% 
        # Merge with the `teams` dataset for team names and strength variables
        left_join(
          teams %>% 
            # Add data about the opposition
            select(
              opponent_team_id = id, 
              opponent_team = name,
              opponent_strength = strength,
              opponent_strength_overall_home = strength_overall_home,
              opponent_strength_overall_away = strength_overall_away,
              opponent_strength_attack_home = strength_attack_home,
              opponent_strength_attack_away = strength_attack_away,
              opponent_strength_defence_home = strength_defence_home,
              opponent_strength_defence_away = strength_defence_away
            ),
          by = "opponent_team_id"
        ) %>% 
        select(-opponent_team_id)  # Don't need this anymore
    })
  ) %>% 
  # Convert from nested to long format, leaving us with one row per team-fixture
  unnest() %>% 
  # Impose more practical column ordering
  select(
    gameweek, team, opponent_team, is_home, difficulty, everything()
  )

plot_data
## # A tibble: 722 x 19
##    gameweek team  opponent_team is_home difficulty strength strength_overal… strength_overal… strength_attack… strength_attack… strength_defenc…
##       <int> <chr> <chr>         <lgl>        <int>    <int>            <int>            <int>            <int>            <int>            <int>
##  1        2 Man … Wolves        FALSE            3        4             1310             1330             1250             1260             1310
##  2        3 Man … Crystal Pala… TRUE             3        4             1310             1330             1250             1260             1310
##  3        4 Man … Southampton   FALSE            2        4             1310             1330             1250             1260             1310
##  4        5 Man … Leicester     TRUE             3        4             1310             1330             1250             1260             1310
##  5        6 Man … West Ham      FALSE            3        4             1310             1330             1250             1260             1310
##  6        7 Man … Arsenal       TRUE             4        4             1310             1330             1250             1260             1310
##  7        8 Man … Newcastle     FALSE            3        4             1310             1330             1250             1260             1310
##  8        9 Man … Liverpool     TRUE             4        4             1310             1330             1250             1260             1310
##  9       10 Man … Norwich       FALSE            2        4             1310             1330             1250             1260             1310
## 10       11 Man … Bournemouth   FALSE            3        4             1310             1330             1250             1260             1310
## # … with 712 more rows, and 8 more variables: strength_defence_away <int>, opponent_strength <int>, opponent_strength_overall_home <int>,
## #   opponent_strength_overall_away <int>, opponent_strength_attack_home <int>, opponent_strength_attack_away <int>,
## #   opponent_strength_defence_home <int>, opponent_strength_defence_away <int>

That’s more like it, now we have a dataset, plot_data thats easily digestable by our plotting packages.

Visualizing the data

My preferred way of visualizing this information is through a heatmap, also known as a tile plot. I figured it would be useful to demonstrate three techniques for producing this type of plot with R. We’ll start with the simplest approach, and finish with the most complex one.

Easy heatmap with ggplot2

When I just want a static plot with no bells and whistles, ggplot2 is my preferred visualization package. The package has already been loaded through the library(tidyverse) command earlier on. I’ll be using the RColorBrewer package for a nice-looking and intuitive color scheme, and the ggthemes package to make the plot look nicer without having to write a lot of code.

library(RColorBrewer)
library(ggthemes)

plot <- plot_data %>% 
  # Modify the `team` variable so teams are displayed in alphabetical order
  mutate(
    team = factor(team) %>%
      factor(., levels = rev(levels(.)))
  ) %>% 
  # Create the plot based on our input data
  ggplot(aes(x = gameweek, y = team, fill = difficulty)) + 
  geom_raster() +
  # Specify the x-axis layout
  scale_x_continuous(
    limits = c(0, 39), 
    breaks = seq(2, 38, 2), 
    expand = c(0, 0)
  ) +
  # Set the color scheme
  scale_fill_distiller(palette = "YlOrRd", direction = 1) +
  # Set the theme
  theme_tufte(base_family = "sans-serif") +
  # Set the labels
  labs(
    title = "Fixture difficulty, Premier League 2019-20",
    x = "Gameweek", 
    y = "", 
    fill = "Difficulty"
  )

plot

Not too shabby, but we can do better.

Interactive heatmap with plotly

I want to to incorporate even more information in the plot without making it too noisy. We could, for example, make it so that hovering your cursor over one of the colored tiles would provide information about that fixture. This is quite easy to do with the excellent plotly package for R.

library(plotly)

plot <- plot_data %>% 
  # Modify the `team` variable so teams are displayed in alphabetical order
  mutate(
    team = factor(team) %>%
      factor(., levels = rev(levels(.)))
  ) %>% 
  # Create the plot based on our input data
  ggplot(aes(x = gameweek, y = team, fill = difficulty)) + 
  geom_raster(
    # Define the tooltip contents with HTML
    aes(text = paste0(
      "<b>", team, " (GW", gameweek, 
      ifelse(is_home, " Home", " Away"), ")</b><br>",
      "Attack strenght<br>",
      "&nbsp;&nbsp;Home: ", strength_attack_home, "<br>",
      "&nbsp;&nbsp;Away: ", strength_attack_away, "<br>",     
      "Defense strength<br>",
      "&nbsp;&nbsp;Home: ", strength_defence_home, "<br>",
      "&nbsp;&nbsp;Away: ", strength_defence_away, "<br><br>",     
      "<b>vs. ", opponent_team, "</b><br>",
      "Attack strenght<br>",
      "&nbsp;&nbsp;Home: ", opponent_strength_attack_home, "<br>",
      "&nbsp;&nbsp;Away: ", opponent_strength_attack_away, "<br>",     
      "Defense strength<br>",
      "&nbsp;&nbsp;Home: ", opponent_strength_defence_home, "<br>",
      "&nbsp;&nbsp;Away: ", opponent_strength_defence_away
    ))
  ) +
  # Specify the x-axis layout
  scale_x_continuous(
    limits = c(0, 39), 
    breaks = seq(2, 38, 2), 
    expand = c(0, 0)
  ) +
  # Set the color scheme
  scale_fill_distiller(palette = "YlOrRd", direction = 1) +
  # Set the theme
  theme_tufte(base_family = "sans-serif") +
  # Set the labels
  labs(
    title = "Fixture difficulty, Premier League 2019-20",
    x = "Gameweek", 
    y = "", 
    fill = "Difficulty"
  )

# Render the `ggplot2` visualization with `plotly`
ggplotly(plot, tooltip = "text") %>% 
  # Disable unneccessary plotly functionality
  layout(
    xaxis = list(fixedrange = T),  
    yaxis = list(fixedrange = T)
  ) %>% 
  config(displayModeBar = F) 

Useful, but we can do even better. When we’re looking at upcoming fixtures, trying to figure out what players we should add to our fantasy teams, we might only be interested one team’s attacking strength, versus another team’s defensive strength. The plot above allows you to compare these variables, but it requires too many cognitive calories for my liking.

Even more interactive heatmap with plotly + shiny

We can take this plot to the next level by allowing users to choose what information they would like to see. The go-to library for such matters in the world of R, is called shiny. Making a shiny application requires a fair bit of code; too much for this post. Instead, I’ll let you play around with the end result below, and refer you to my GitHub repository if you’d like to see how it was made. You’ll find an R notebook (.rmd) file that you can copy-paste into RStudio, and run to create an up-to-date version of the plot below. It should totally work for the remainder of the 2019-20 season (assuming you’ve installed all the required packages). If they change the API again next year, it might need some adjustments.


Feel free to leave a comment (requires a GitHub account):