L1 Chance & Data - Statistics Flashcards
Probability
A measure of the likelihood that an event will occur
Number of favourable outcomes/total number of outcomes
Mean
box and whisker
All data values added together divided by the number of items added together
Median
box and whisker
The middle value in the data set
remember median is the BEST measure of average because it doesn’t get affected by outliers at the top
Mode
box and whisker
The number that is the most present
Range
box and whisker
The difference between the max and minimum
Range = Max-Min
Scatter Plot
Look for the trend line and outliers
Advantages & Disadvantages of Scatter Plot
Advantages
- A scatter plot is good for comparing two different variables (i.e. height and weight), and seeing if there is a relationship between them.
- Show the relationship between two variables
- patterns are easy to observe
Disadvantages
- Does not show a relationship for more than two variables
- unable to give the exact extent of correlation
Bar Graph
Bar charts are usually used to show how many individuals in a sample fall into each category of some variable.
Advantages & Disadvantages of Bar graph
Advantages
- easy to understand
- summarise a large amount of data in a visual, easily interpretable form
Disadvantages
- bar charts often fail to mark key assumptions, patterns, and causes
- not a lot of data can be added
Box & whisker plot
Look for median, inter-quartile, range, and spread
Advantages & Disadvantages of Box & Whisker plot
Advantages - A box and whisker plot is good for seeing the overall features of data (i.e. the median and the quartiles) very effective and easy to read - - Disadvantages - - -
Time series
Look for trends over time
Advantages & Disadvantages of Time Series
Advantages
- A time-series graph is good for seeing how data changes over time.
Dot Plot
A dot plot is a method of visualization representing expectations for some data series. Remember to look for the median, spread, and outliers
Advantages & Disadvantages of Dot Plot
Advantages
- A dot plot is good for analyzing individual data points, as well as seeing how data is distributed.
- They clearly display cluster/gaps of data and outliers
Disadvantages
- It can be time-consuming when it comes to a large amount of data
Pie Chart
Pie charts can be used to show percentages of a whole, and represent percentages at a set point in time
Advantages & Disadvantages of Pie Chart
Advantages
- summarize a large data set in visual form.
- be visually simpler than other types of graphs
- display relative proportions of multiple classes of data.
Disadvantages
- do not easily reveal exact values.
- fail to reveal key assumptions, causes, effects, or patterns.
Trend
box and whisker
A trend is an underlying pattern that the majority of data in an investigation follows. These show what the data tends to do. They show the direction in which the data changes. It can be either positive or negative.
- Strong/positive
- Weak/positive
- Strong/negative
- Weak/negative
Regular patterns
Box and whisker
Unusual patterns
box and whisker
Spread
box and whisker
Symmetry/skew
box and whisker and dot plot
Data that is not symmetrical
- If there’s more on the right, the data is skewed left
- If there’s more to the left, the data is skewed right
Shift
Box and whisker
Where the box plots of two groups overlap. When analyzing shift and overlap, have a look at…
- The difference between the medians
- Each group’s spread, based on the range and interquartile range.
- The overall spread of the graph.
- Whether the medians for each group overlap with the box for the other group
Spread
Box and whisker
Range of values
Centre
box and whisker
Upper Quartile (UQ) Box and whisker
25% of data values are ABOVE this point on the graph
Outlier
Box and whisker
Outliers are values that are detached from a prominent pattern followed by the large majority of the rest of the data.
- Outliers are values that are higher or lower than normal. Basically, they’re outside the trend.
- Outliers AFFECT the mean
Minimum
box and whisker
The lowest value (most likely be given to you)
Maximum
Box and whisker
The highest value (most likely be given to you)
Overall visible spread (OVS)
Box and whisker
Distance between the highest upper quartile, and the lowest lower quartile. We
Cluster
cluster tells us where we would find most of the data. - -
Cluster is most commonly seen on scatter plots
Cluster
Cluster tells us where we would find most of the data. - -
- most commonly seen on scatter plots
Positive trend
A positive trend means that as the dependent variable increases, the independent variable also increases and vice versa. The steeper the line, the faster one variable increases compared to the other
Negative trend
A negative trend means as the dependent variable increases, the independent variable decreases and vice versa
Overall visible spread (OVS)
Box and whisker
Distance between the highest upper quartile, and the lowest lower quartile
Seasonal trend
When data repeats itself like the seasons.
Dependent variable
A dependant variable is usually on the y-axis and it’s the thing that will often change depending on some other variable, which is why it’s called dependant.
Independent variable
An independent variable is something that doesn’t change according to anything else, it’s independent and doesn’t need any other variables. These are usually on the x-axis
Seasonal trend
Patterns that happen every year or month like when data repeats itself like the seasons.
Long-term trend
The biggest overall trend