Programming Presentation Flashcards

1
Q

Fantasy Premier League (FPL) has seen huge growth in the past few years, now with over 10 million users signed up

A

Fantasy Premier League (FPL) is a popular game that now has >10 million participants. Users can set up private leagues with friends and family, giving the game an additional competitive flavour.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Players receive points on FPL based on their contributions in a game, points are awarded for goals, assists, clean sheets etc.

A

The game is based on actual Premier League matches, where players are awarded points according to their individual contributions and team results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The use of data is becoming more common for FPL players trying to gain an edge over their private league rivals, this was the basis for my project

A

The use of data and AI is becoming more common as FPL players try to gain an edge over their opponents. The aim of my project was to analyse data through a series of plots that would aid with the team selection process and help determine which stats are the most important to consider when selecting an FPL team.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

I used 3 datasets in my analysis, one of which was used to produce the majority of the visualisations while the other two were used to construct one piece of analysis each

A

I carried out the bulk of my analysis on a dataset that included relevant FPL stats about every player in the Premier League. I also made use of two other datasets, one included expected goals information about every Premier League player who had made an appearance in each of the last 5 seasons, and the other consisted of fixture difficulty ratings for each teams’ next 10 games.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Everyone starts with a 100m budget and must select 15 players - priced based on their performances in the previous season

A

FPL players start with a fixed budget with which they must purchase players for their team, priced according to their previous seasons’ form. Thus, the first metric I thought would be useful to consider is value of football players in terms of points per million.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Maximising return on investment leads to maximum points (when full budget is used): Points per Million metric displays best value players

A

There is a clear advantage to selecting players that give the best return on investment, i.e. the best points:cost ratio. By plotting the top players in terms of this metric, while also excluding players below a certain threshold of total points, the players worth considering for selection are displayed clearly in a simple bar chart.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Players with cheap price tags are favoured, some of the best performers score poorly by this metric because they are so expensive

A

From the plot, it is clear that players who have a cheap price tag are favoured. While this is useful for indicating value for money, some of the highest point scorers during this season so far, such as Salah and Haaland, score poorly by this metric, because, as historically the best performing players, they are the most expensive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

This metric can’t be used in isolation, total points among other metrics need to be considered as well

A

So, while maximising points per million is important, the metric can’t be used in isolation, and total points has to be considered as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Total points alone discriminates against players who are returning from injury, points per 90 is more informative of how a player scores when they actually play

A

The problem with total points as a metric is that it doesn’t consider players whose seasons have been affected by injury. This led me to question whether points per 90 (minutes) could be a more informative criterion for selection as it would highlight players who returned well while on the pitch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The problem with points per 90 is that it can favour players who aren’t consistent starters, so it won’t translate to total points in these cases

A

A possible problem with this approach is that it could also highlight players who have very limited game time not because of injury, but because of their inability to hold down a starting position for their club. To determine whether points per 90 is an appropriate parameter by which to select a team, I plotted a scatter graph to see if it reflected total points appropriately. From this initial plot, it was clear that players with only limited game time were favoured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The use of another stat, starts per 90, allows for exclusion of players who don’t start even when fit (edges in green)

A

To get around this, I made use of another stat - starts per 90 – that was included in my primary dataset. This indicated how often players started the matches in which they played. Using a condition of minimum starts per 90 I was able to identify players who are unlikely to play many minutes even when fit by highlighting their point’s edges in green. Thus the plot helps point out players who play most of the time and who deliver the most points during that game time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Fixtures can influence players’ performance massively: players who normally wouldn’t be worth considering for selection can be great picks when they have an ‘easy’ run of fixtures

A

Although points per 90 and points per million are very important, fixtures (i.e. which team the selected player is playing against in any given game week) can influence the scoring of points dramatically. Thus even the players ranked highest by the above metrics often perform worse when facing tougher opponents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Fixtures are ranked from 1-5 using a model, with 1 being the easiest and 5 being the hardest

A

Fixtures are ranked from 1-5, with 1 being the easiest and 5 being the hardest. A colour coordinated layered bar chart nicely displays fixture difficulties over the next 3, 6 and 10 game weeks, and helps to make informed choices about the best teams to select players from (at any point during the season).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The graph shows that Brighton (BHA) have a good run of fixtures over short and medium term

A

For example, from the graph it can be seen that Brighton have a run of easier games approaching and so Brighton players should be prioritised for selection in the short term.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Expected goals (xG) is a metric that measures the probability that a specific chance will be converted, with 0 being no chance of scoring and 1 being a guaranteed goal

A

Amongst the emerging metrics from football-related data is Expected goals (or xG), which is now widely used. It effectively measures the probability that a given chance will be scored, returning a decimal between 0 and 1.xG is used cumulatively over the course of the season and thus reflects the quality of chances a player is getting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Expected assists (xA) measures the quality of chances created using a similar model to xG that considers different variables

A

Expected goal involvements (xGI) is the sum of xG and expected assists (xA), which is calculated using a similar model to xG but considers different variables. In essence, xG reflects a player’s ability to get into goal scoring positions while xA reflects their ability to create chances. I don’t have time to fully explain xGI now, but I’m happy to answer any questions you may have about it. To determine how relevant this metric is when selecting players for FPL, I plotted total points scored against xGI for a seletion of players

17
Q

The graph shows a strong correlation between xGI and total points (which is representative of actual goal involvements)

A

From this it’s evident that the highest point scorers have a very high xGI. Some of the highest scoring players, however, will consistently out-perform their xG season on season, reflecting exceptional finishing ability (I will discuss this further later on). On the other hand, some players are poorer finishers, which is reflected by a consistently lower number of goals than xG. Overall, a player’s xGI is very closely related to their actual goal involvements which can be seen from the line of best fit and its high correlation coefficient. Therefore, it is an important metric to take it into account for team selection.

18
Q

Because xGI and total points are strongly correlated, players who are overperforming their xGI in some cases should be avoided

A

Selecting players who have high points per 90, are good value and have good fixtures will help FPL players maximise their team’s total points. However, as discussed before, xGI and total points are strongly correlated. This indicates that we should ensure that we don’t select players who are significantly over-performing in relation to their xGI, because those that do commonly experience performance decline during the season, and thus are not good longer term investments.

19
Q

By this logic Son Heung-Min should be avoided, as he is far above the line of best fit in the previous graph

A

As an example, such an analysis suggests that it might seem prudent to avoid selecting Heung-Min Son as his total points are much higher than expected from his xGI.

20
Q

Some players have outstanding finishing ability and consistently overperform their xG every year, the graph shows the top xG overperformers over the last 5 years

A

However, some players have outstanding finishing ability, proven over several seasons, meaning they consistently score from difficult (low xG) chances. To investigate this further, a plot of the top xG overperformers over the last 5 years was constructed to identify those players whose performance is much less likely to decline during the season.

21
Q

These players should still be considered for selection even if they are overperforming their xGI (due to xG) this year, as their performance is unlikely to tail off as they’ve proved to be consistent in previous years

A

From the graph we can see that Son’s actual goals scored is higher than his xG in four out of the five past 5 seasons. Thus, Son’s performance is unlikely to tail off later in the season and he should be considered strongly for selection when available.

22
Q

Moving individual annotations required one list to be indexed by one number and the other list to be indexed by another number, the use of a loop within a loop was required

A

When plotting total points against xGI, some of the annotations had to be moved individually after they had been added to the graph. I used a command called set_position to do this, which required information about the specific annotation. Thus, I appended the information for every annotation to a list. When trying to edit the position of an annotation, a problem arose with indexing the list of annotations, as the original coordinates couldn’t be indexed by the same number because not all points were annotated. To solve this, I wrote a loop within a loop that, for each annotation, looped over every player. When it got to the row including the web name of the annotation being moved, used the xGI and total points in that row as the origin for the annotation.This was followed by + x and + y which allowed the annotation to be moved by the desired amount.

23
Q

When merging data frames for several years into one, duplicate rows rows of the same player were arising in the final data frame due to players who had transferred halfway through a season. This was fixed by grouping rows by player name such that players with two rows in a single year were added together to create a combined xG and goal tally for that year

A

When creating the xG over-performance graph, I had to merge several data sets for individual years into one data set that I could then use for plotting. A problem with combining the datasets arose because players who had transferred to another team in the Premier League midway through a season had two rows in that particular data set, one for each team they played for. As I was merging on the ‘Player’ column, the rows for the same player (at different teams) both merged with every other row in the other data frames that included that players name. This meant that multiple rows for each of these transferred players were created. I solved this issue by grouping by the Player column and summing their xG and goals in each of the individual data frames. This meant that if a player had two rows in one of the years it was combined into one and hence there were no duplicate rows in the final data frame.