U2 + U3 - Processing and representing data Flashcards
What can continuous data be represented by?
- Histograms
- Cumulative frequency curves
- Line graphs
What can discrete data be represented by?
- Bar charts
- CF step polygons
What can categorical data be represented by?
- Frequency tables (normal freq tables, relative freq tables, cf tables)
- Pie charts
- Bar charts
What can ordinal data be represented by?
- Bar charts
- Pie charts
- Tables
What do you need to ensure when drawing a pictogram? (4)
- Each picture is the same size
- The picture can be divided easily to show different frequencies
- The spacing between the pictures is the same in each row
- You write a key to show what each symbol represents
2 properties of a bar chart
- Bars are equal width with equal spaces between them
- The height of the bar represents the frequency
Differences (2) between histograms and bar charts
- The bars of a bar chart don’t touch, but in a histogram, the bars touch because the data is continuous
- You can draw on a frequency polygon using the midpoint of each class on a histogram
Why are composite bar charts good?
The total frequencies and the frequencies of each component can be easily compared
Compare a stem-and-leaf diagram with a bar chart
A stem and leaf diagram shows the shape of the data distribution in the same way as a bar chart, but you can still see the original data values
How to compare
i) total frequencies
ii) proportions
in comparative pie charts
i) Compare the areas
ii) Compare the individual angles
Why would you use comparative pie charts over normal pie charts?
When two sets of data have different total frequencies, drawing two pie charts the same size to represent them would be misleading.
In a cumulative frequency graph, when would you use curved lines and when would you use straight step polygons?
Curved lines - continuous data
Straight lines - discrete data
What is a choropleth map?
A map that is used to classify regions of a geographical area. Regions are shaded with an increasing depth of colour. A key shows what each shade represents.
What is positive skew? What is negative skew?
POS - Most of the data values are at the lower end
NEG - Most of the data values are at the upper end
Misleading things about graphs (7)
- scales that do not start at zero
- scales that do not increase uniformly
- thick lines on a graph
- axes without labels
- graphs and charts without keys
- colours on a graph
- 3D diagrams
- Scales that do not start at zero give a misleading impression of the heights of bars
- Scales that do not increase uniformly distort the shape of anything plotted on them
- Lines on a graph that are drawn too thick make it difficult to read information
- Axes without labels prevent you from knowing what the data represents
- Graphs and charts without keys may be impossible to interpret
- Colours may make some parts of a graph or chart stand out more than others
- 3D diagrams make comparisons difficult as data proportions appear distorted
Advantages of bar charts and line graphs (2)
- They show trends and patterns in data
- You can read values from the scale as long as it is not too small
Advantage and disadvantage of pie charts
ADV:
- They show proportions
DISADV:
- They don’t show accurate data values
Advantage and disadvantage of tables
ADV:
- They give exact data values for different categories
DISADV:
- They don’t show trends and patterns as clearly
In pie charts, what does
i) the area of each sector
ii) the area of the whole pie chart
represent?
i) the area of each sector is proportional to the total frequency it represents
ii) the area of the whole pie chart is proportional to the total frequency
Median in discrete data
Median in continuous data
Discrete: (n+1)/2
Continuous: 1/2
What is standard deviation?
A measure of how much all the values deviate from the mean value (how spread out they are)
Using calculation, what indicates
i) positive skew
ii) negative skew
i) mean > median > mode
ii) mode > median > mean
Advantages of mode (4)
Disadvantages of mode (2)
ADV:
- Easy to find
- Can be used with any type of data
- Unaffected by open-ended or extreme values
- Mode is always a data value
DISADV:
- May be no mode or sometimes more than one
- Cannot be used to calculate a measure of spread
Advantages of median (4)
Disadvantage of median (1)
ADV:
- Easy to calculate
- Unaffected by extreme values
- Best to use when data is skewed
- Can be used to help calculate quartiles, interquartile range and skew
DISADV:
- May not be a data value
Advantages of mean (2)
Disadvantages of mean (2)
ADV:
- Uses all the data
- Can be used to calculate standard deviation and skew
DISADV:
- Always affected by extreme values
- Can be distorted by open-ended classes
What can you compare mode with?
Range for quantitative data
What can you compare median with?
Range, interquartile range
What can you compare mean with?
Range, standard deviation
In a distribution:
- 50% of the data is less than the median, and 50% is greater
- 25% of the data is less than the lower quartile
- 25% of the data is greater than the upper quartile
- 50% of the data is between the lower and upper quartiles