Everything Flashcards
What is raw data?
Unprocessed data that has just been collected and needs to be ordered, grouped, rounded, or cleaned.
Define qualitative data.
Non-numerical, descriptive data such as eye/hair colour or gender.
What type of data is easier to analyze, qualitative or quantitative?
Quantitative data.
Give an example of quantitative data.
Height, weights, marks in an exam.
What is discrete data?
Data that only takes particular values, such as shoe size or number of people.
What is continuous data?
Data that can take any value, such as height or weight.
Define categorical data.
Data that can be sorted into non-overlapping categories, such as gender.
What is ordinal data?
Quantitative data that can be given an order or ranked on a rating scale.
What does bivariate data involve?
Measuring two variables, which can be qualitative or quantitative.
What is multivariate data?
Data made up of more than two variables.
What is the purpose of grouping data?
To make it easier to spot patterns and see how the data is distributed.
True or False: Discrete data can be grouped into overlapping classes.
False.
What is a primary data source?
Data that you have collected yourself or someone has collected on your behalf.
Define secondary data.
Data that has already been collected.
What is a population in statistics?
Everyone or everything that could be involved in the investigation.
What is a census?
A survey of the entire population.
Fill in the blank: A _______ is a smaller number from the population that you actually survey.
Sample.
What is a sampling frame?
A list of all the members of the population.
What is a biased sample?
A sample that does not represent the population fairly.
Define random sampling.
Every item/person in the population has an equal chance of being selected.
What is stratified sampling?
Sampling where the size of each group in the sample is in proportion to the sizes of those groups in the population.
What is systematic sampling?
Choosing items in the population at regular intervals.
Define cluster sampling.
The population is divided into natural groups, and groups are chosen at random with every member sampled.
What is quota sampling?
Population is grouped by characteristics and a fixed amount is sampled from every group.
Fill in the blank: Opportunity sampling uses the people/items that are _______.
Available at the time.
What is judgment sampling?
When the researcher uses their own judgment to select a sample they think will represent the population.
What does the Petersen Capture-Recapture method estimate?
The size of large or moving populations.
What is an explanatory variable?
The variable that is changed in an experiment.
Define response variable.
The variable that is measured in an experiment.
What is a sample size?
Size large enough and representative of the population.
What is an experiment?
Used when a researcher examines how changes in one variable affect another.
Define Explanatory (Independent) Variable.
The variable that is changed.
Define Response (Dependent) Variable.
The variable that is measured.
What are Extraneous Variables?
Variables not of interest but that could affect the result of your experiment.
What characterizes Laboratory Experiments?
Researcher has full control over variables; conducted in a lab or similar environment.
Give an example of a Laboratory Experiment.
Measuring reaction times of people of different ages.
What is the Explanatory variable in the laboratory example?
Age.
What is the Response variable in the laboratory example?
Reaction time.
List some Extraneous variables in laboratory experiments.
- Gender
- Health condition
- Fitness level.
What are advantages of Laboratory Experiments?
- Easy to replicate
- Extraneous variables can be controlled.
What is a disadvantage of Laboratory Experiments?
People may behave differently under test conditions than in real life.
What are Field Experiments?
Carried out in the everyday environment with some control over variables.
Give an example of a Field Experiment.
Testing new methods of revision.
What is the Explanatory variable in the field experiment example?
Method of revision.
What is the Response variable in the field experiment example?
Results in exam.
List some Extraneous variables in field experiments.
- Amount of revision pupils do
- Ability of pupils.
What are the advantages of Field Experiments?
More accurate; reflects real life behaviour.
What is a disadvantage of Field Experiments?
Cannot control extraneous variables.
What are Natural Experiments?
Carried out in everyday environments with little control over variables.
Give an example of a Natural Experiment.
The effect of education on level of income.
What is the Explanatory variable in the natural experiment example?
Level of education.
What is the Response variable in the natural experiment example?
Income.
List some Extraneous variables in natural experiments.
- IQ
- Other skills individuals may have
- Personal circumstances.
What is an advantage of Natural Experiments?
Reflects real life behaviour.
What are disadvantages of Natural Experiments?
- Low validity
- Difficult to replicate.
What is a Simulation?
A way to model random events using random numbers and previously collected data.
What are the steps in conducting a Simulation?
- Choose a suitable method for getting random numbers
- Assign numbers to the data
- Generate random numbers
- Match random numbers to outcomes.
What is a Questionnaire?
A set of questions used to obtain data from the population/sample.
What types of questions can be included in a Questionnaire?
- Open questions
- Closed questions.
What are features of a good questionnaire?
- Easy to understand
- Uses simple language
- Avoids leading questions.
What is a problem with Questionnaires?
Non-response when people do not respond to the questionnaire.
What is the Random Response Method?
Uses a random event to decide how to answer a question ensuring anonymity.
What is a Pilot Study?
A small-scale replica of the study to test the design and methods of the questionnaire.
What is an Interview?
Where you question each person individually, involving specific questions or topics.
What are Outliers?
Values that do not fit in with the pattern or trend of the data.
What does Cleaning Data involve?
- Identifying and correcting/removing incorrect data values or outliers
- Putting all data in the same format.
What is a Control Group?
Used in an experiment to ensure that the treatment given is causing the experimental results.
What are Matched Pairs?
Two groups of equally matched people used to test the effect of a particular factor.
Define Hypothesis.
A statement that can be tested by collecting and analysing data.
What are the stages of an Investigation?
- Planning
- Collecting Data
- Processing and Representing data
- Interpreting Results.
What is the first stage of an investigation?
Planning
In this stage, you choose a hypothesis, decide what data to collect (variables), and determine how to record the data (data collection tables).
What does the collecting data stage involve?
Choosing data sources (primary/secondary), collection methods (questionnaire/interviews), and control factors.
This stage is crucial for ensuring accurate and relevant data is gathered for analysis.
List the stages of an investigation.
- Planning
- Collecting Data
- Processing and Representing Data
- Interpreting Results
- Evaluating Methods
What are databases in the context of data representation?
Tables with a collection of data, often secondary data that is available online.
These databases usually contain real-life statistics and are essential for interpreting data.
What is a common inconsistency found in data tables?
Percentages do not add up to 100% due to rounding errors.
This is often encountered when individual percentages for columns/rows in tables have been rounded.
What type of data do two-way tables represent?
Bivariate data, which has information in two categories and two variables.
They are useful for analyzing relationships between two different data sets.
What is a pictogram?
A representation using pictures or symbols to show a particular amount of data.
It always includes a key to indicate the amount each symbol represents.
What are the key features of simple bar charts?
- Bars are equal width
- Equal gaps between bars
- Frequency on y-axis
What distinguishes multiple bar charts from simple bar charts?
They can compare two or more sets of data with more than one bar for each class represented by different colours.
This allows for a clearer comparison between different data categories.
How are composite bar charts structured?
Single bars split into different sections for each category, used to compare different times/days/years.
The frequency of each component is calculated by subtracting the upper frequency of that component from the lower frequency.
Define stem and leaf diagrams.
A method of organizing data that retains all original data while presenting it simply, showing the shape of the distribution.
Each value is split into a ‘stem’ (first digits) and ‘leaf’ (last digit).
What is the purpose of pie charts?
To display data showing how something is shared or divided into categories, with each sector representing a proportion of the total data.
The angles in a pie chart must add up to 360 degrees.
True or False: Comparative pie charts can be used to compare two sets of data of different sizes.
True
What do population pyramids show?
Distribution of ages in a population, either in numbers or proportions/percentages.
They are used to compare two sets of data, usually genders or geographical areas.
What do choropleth maps represent?
Geographical areas split into different regions that are shaded based on frequency.
The darker the shading, the higher the frequency for that area.
What is cumulative frequency?
A running total of frequencies.
It helps in understanding the total number of occurrences up to a certain point in a dataset.
What is the formula for frequency density in histograms?
Frequency Density = Frequency / Class Width
This reflects the concentration of values within each range of the dataset.
How do you estimate the median from cumulative frequency diagrams?
Divide total frequency by 2, find that value on the y-axis, draw a horizontal line to the curve, and read off the value from the x-axis.
What is the formula to calculate frequency density?
FD = F/CW
FD stands for Frequency Density, F is Frequency, and CW is Class Width.
What are the steps to draw a histogram?
- Calculate class widths for each class interval
- Calculate frequency density for each class interval
- Draw a suitable scale on y-axis labelled frequency density
- Draw bars using frequency density data
Remember that the bars have no gaps in between.
What is the shape of a distribution?
It can be positive, negative, or symmetrical.
What is the difference between a histogram and a frequency polygon?
A histogram uses bars, while a frequency polygon uses mid-points of class intervals plotted and joined with straight lines.
True or False: To compare histograms, they need to have different class intervals.
False
They need to have the same class intervals and frequency density scales.
What is the mode in a dataset?
The value that appears the most.
How do you find the median in discrete data?
- Put the numbers in order from smallest to largest
- Find the (n + 1)th value
- If the position is a decimal, average the two middle values.
n is the total frequency.
What is the formula for the mean?
𝑥̅ = ∑𝑥/𝑛
Where 𝑥̅ is the mean, ∑𝑥 is the sum of data values, and 𝑛 is the number of data values.
What is a weighted mean?
Used to combine different sets of data where one set is more important than another.
Fill in the blank: The _______ is the class with the highest frequency.
Modal Class
What can cause diagrams to be misleading?
- Shape of the diagram
- Axes and scales
Examples include scales not starting at zero, missing values, or unevenly scaled axes.
How do you estimate the median for grouped continuous data?
Use ½ n to find the median position and calculate using cumulative frequency.
What is the geometric mean?
The nth root of the product of all the values.
What happens to the mean if you add a value greater than the mean?
The mean increases.
What is the first step to transforming data?
Take away the same large number from all the values.
True or False: The median will always change if a new value is added.
False
The median may stay the same if the added value is equal to the median.
What is a common error when drawing frequency polygons?
Not using midpoints.
What should you do if the median position is a decimal?
Find the two values around that position and divide by 2.
Fill in the blank: The sum of all values divided by the number of values is called the _______.
Mean
What is the formula for estimating the median using linear interpolation?
Add the lower bound for the class interval to the result of multiplying the frequency for the median class.
What happens to the mean if a value greater than the mean is added?
The mean increases.
What happens to the mean if a value less than the mean is removed?
The mean increases.
What happens to the mean if a value less than the mean is added?
The mean decreases.
What happens to the mean if a value greater than the mean is removed?
The mean decreases.
What is the mode?
The value that appears most frequently in the data.
List advantages of using the mode.
- Easy to use
- Always a value in the data
- Unaffected by extreme values
- Can be used with quantitative and qualitative data
List disadvantages of using the mode.
- There may not be a mode or may be more than one mode
- Cannot be used to calculate measures of spread
- Not always representative of the data
What is the median?
The middle value when the data is ordered.
List advantages of using the median.
- Easy to find when data is in order
- Unaffected by outliers/extreme values
- Best to use with skewed data
List disadvantages of using the median.
- May not be a data value
- Not always representative of the data
What is the mean?
The average of all the data values.
List advantages of using the mean.
- Uses all the data
- Can be used to calculate standard deviation and skew
List disadvantages of using the mean.
- May not be a data value
- Always affected by extreme values or outliers
What does the range measure?
How spread out the data is.
How is the range calculated?
Range = Largest Value - Smallest Value.
What is the Interquartile Range (IQR)?
The middle 50% of the data when in order.
How is IQR calculated?
IQR = Upper Quartile - Lower Quartile.
What is the Lower Quartile (LQ)?
The value ¼ of the way through the data.
What is the Upper Quartile (UQ)?
The value ¾ of the way through the data.
How do you find LQ and UQ for discrete data?
LQ = ¼ (n+1)th value, UQ = ¾ (n+1)th value.
What is the Interpercentile Range (IPR)?
The difference between two percentiles.
What are deciles?
Values that divide the data into 10 equal parts.
What is the Interdecile Range?
The difference between the first and ninth deciles.
What does standard deviation (SD) measure?
How far all the values are from the mean value.
What is the formula for standard deviation?
σ = √(1/n ∑(x - x̅)²) or σ = √(∑x²/n - (∑x)²/n²).
What is a box plot?
A graphical representation of data that shows its distribution.
What are the five pieces of information included in a box plot?
- Minimum Value
- Lower Quartile (LQ)
- Median
- Upper Quartile (UQ)
- Maximum Value
What are outliers?
Values that are far from the rest of the data.
How are outliers identified?
Values that are more than 1.5 x IQR above UQ or below LQ.
What is skewness?
Describes the shape of the distribution and how the data is spread out.
True or False: The mean is always affected by extreme values.
True.
Fill in the blank: The _______ is the average of all the data values.
[mean]
Fill in the blank: The _______ is the value that appears most frequently in the data.
[mode]
Fill in the blank: The _______ is the middle value when the data is ordered.
[median]
Fill in the blank: The difference between the largest and smallest values is called the _______.
[range]
Fill in the blank: The middle 50% of the data is represented by the _______.
[IQR]
What is the IQR used for?
Measure of spread
IQR stands for Interquartile Range, which measures the middle 50% of data.
What does skewness describe?
The shape of the distribution and how the data is spread out.
What indicates positive skewness?
Most values are at the beginning of the data set with few higher values.
In positive skewness, what is the relationship between mean, median, and mode?
Mean > Median > Mode
What indicates negative skewness?
Most values are at the end of the data set with few lower values.
In negative skewness, what is the relationship between mean, median, and mode?
Mean < Median < Mode
What signifies a symmetrical distribution?
Mean = Median = Mode
What does a normal distribution look like on a box plot?
Median is halfway between LQ and UQ.
How is skewness calculated using a formula?
Skewness = 3(mean - median) / standard deviation
What does a positive skewness value indicate?
Positive skew.
What does a negative skewness value indicate?
Negative skew.
When comparing data sets, what measures should be used?
Average (mean/median/mode) and spread (range/IQR/SD) or skewness.
What does a lower standard deviation indicate?
Values are closer to the mean and therefore similar.
What are scatter diagrams used for?
To show if there is a relationship between two variables.
What is the explanatory variable in a scatter diagram?
The independent variable plotted on the x-axis.
What is the response variable in a scatter diagram?
The dependent variable plotted on the y-axis.
What indicates a positive correlation?
As one variable increases, so does the other.
What indicates a negative correlation?
As one variable increases, the other decreases.
What is a causal relationship?
When one variable causes a change in another.
What does the line of best fit (LOBF) represent?
A straight line drawn through the middle of the points on a scatter diagram.
In the equation of LOBF, what does ‘a’ represent?
The gradient.
In the equation of LOBF, what does ‘b’ represent?
The y-intercept.
What is the purpose of interpolation?
To make predictions within the range of data given.
What is the purpose of extrapolation?
To predict values outside of the range of values given.
How is Spearman’s Rank Correlation Coefficient (SRCC) calculated?
SRCC = 1 - (6 * ∑d²) / (n(n² - 1))
What does a value of SRCC near 1 indicate?
Strong positive correlation.
What does Pearson’s Product Moment Correlation Coefficient (PMCC) measure?
The strength of linear correlation between two variables.
What is a time series graph used for?
To spot trends over time.
What is plotted on the x-axis of a time series graph?
Time.
What is a time series?
A set of data collected over a period of time at equal intervals.
What is the purpose of time series graphs?
To spot trends, usually going up, down, or fluctuating.
What does a trend line show?
The general trend of the data.
What are moving averages?
An average worked out for a given number of successive observations.
Why are moving averages used?
To smooth out fluctuations and make the trend line more accurate.
What are seasonal variations?
A pattern that repeats at a specific point every cycle.
How is seasonal variation calculated?
Seasonal Variation = Actual Value - Trend Value.
What is the Estimated Mean Seasonal Variation (EMSV)?
The average of all the seasonal variations for the same point in each cycle.
How can future values be predicted in time series?
Using the trend line and estimated mean seasonal variations.
What is simple probability?
A measure of how likely an event is to happen.
How can probabilities be expressed?
As fractions, decimals, or percentages.
What is an outcome in probability?
A possible result of an experiment or trial.
What does P(event) represent?
The number of successful outcomes divided by the total number of outcomes.
What is expected frequency?
The number of times you expect an event to happen.
How is experimental probability estimated?
Using results of previous trials to predict future probabilities.
What is risk in probability?
The likelihood of a negative event occurring.
What are the two types of risk?
- Absolute Risk
- Relative Risk
What does a sample space represent?
A list of all the possible outcomes.
What is a sample space diagram?
A table used to represent the outcomes of two events.
What is a Venn diagram?
Uses overlapping circles to represent all outcomes of two or three events.
What are mutually exclusive events?
Events that cannot happen at the same time.
What is the addition law in probability?
Used for events that are not mutually exclusive and can happen together.
What are independent events?
Events where the outcome of one does not affect the outcome of the other.
Fill in the blank: The formula for the probability of two mutually exclusive events A and B is P(A or B) = P(A) + P(B). What is the additional component for non-mutually exclusive events? _______
− P(A and B)
What are independent events?
Unconnected events where the outcome of one does not affect the other
Example: Flipping a coin and rolling a dice.
What is the Multiplication Law for independent events A and B?
P(A and B) = P(A) × P(B)
For 3 independent events A, B, and C: P(A and B and C) = P(A) × P(B) × P(C)
How do you calculate P(at least 1)?
P(at least 1) = 1 - P(none)
This formula helps determine the probability of at least one occurrence.
What do tree diagrams represent?
Each branch shows an outcome and probabilities on branches add up to 1
Multiply along the branches for end results and add probabilities down columns.
What happens to the denominator in a tree diagram with replacement?
The denominator stays the same for the second set of branches
The question indicates if the item has been replaced.
What is conditional probability?
The probability of one event affecting the chances of another
Example: Taking a white ball first changes the probability of the second draw.
What notation is used for conditional probability?
P(B | A)
It represents the probability of B given that A has happened.
What is the formula for conditional probability?
P(B | A) = P(A and B) / P(A)
This can also be used to test if two events are independent.
What are simple index numbers used for?
To compare price changes over time
They compare the price change of an item with its base year price.
What does an index number greater than 100 indicate?
The value has increased
An index number less than 100 indicates a decrease.
What does the Retail Price Index (RPI) measure?
The rate of change of prices of everyday goods
RPI is calculated monthly by comparing prices to the same month of the previous year.
What is the Consumer Price Index (CPI)?
Official measure of inflation used by the UK Government
It does not include mortgage payments and is weighted to reflect consumer spending.
What does Gross Domestic Product (GDP) represent?
The value of goods and services produced in a country in a given time
A fall in GDP for two successive quarters indicates a recession.
What are weighted index numbers?
They take into account proportions similar to the weighted mean
Weightings reflect the importance of different items.
What do chain base index numbers compare?
Prices from each year with that of the previous year
They show how values change from year to year.
What are crude rates?
Rates that tell how many times a particular event occurs per 1000 of the population
Examples include crude birth and death rates.
What is the formula for calculating crude rates?
Crude Rate = (number of births/deaths / total population) × 1000
Crude rates can be misleading when comparing different age distributions.
What is a standard population?
A hypothetical population of 1000 used to represent the whole population
It takes into account age, gender, and income distributions.
What does a standardized rate allow you to do?
Compare the same age group in different populations
It uses the standard population for realistic comparisons.
What is a probability distribution?
A list of all possible outcomes with their expected probabilities
Example: Flipping a fair coin results in heads or tails.
What is a binomial distribution?
A type of probability distribution with only two possible outcomes
Examples include flipping a coin (heads or tails) or rolling a six (success or failure).
What conditions must be met for a binomial distribution?
- Fixed number of trials (n)
- Each trial has 2 outcomes (success or failure)
- Trials are independent
- Probability of success is constant
If these conditions are met, the binomial distribution is applicable.
How do you find probabilities using the binomial distribution?
Use (p + q)^n and identify the outcomes and their probabilities
Expand (p + q)^n where n is the number of trials.
What is Pascal’s Triangle used for?
To find coefficients of a binomial distribution
The coefficients follow the pattern of Pascal’s triangle.
What is the probability of landing on 6 three times?
10 × (X Heads) × (X Tails)
P(x) = ½ for Heads and ½ for Tails
What pattern do the coefficients of a binomial distribution follow?
Pascal’s triangle
What is the first row of Pascal’s triangle?
1
How do you find the numbers in Pascal’s triangle?
By adding the 2 numbers directly above
What is the expansion of (p + q)^4?
1p^4 + 4p^3q^1 + 6p^2q^2 + 4pq^3 + 1q^4
What does the nCr button on a calculator represent?
N=number of trials and r=number of successes
How do you calculate the coefficient for 5 trials with 3 successes using nCr?
Type ‘5’, ‘nCr’, ‘3’, ‘=’ to get 10
To find a range of probabilities, what should you do?
Work out their individual probabilities and then add them up
How do you calculate the probability of ‘at least 1 success’?
Work out the probability of 0 successes and subtract from 1
What is the mean (or expected value) of the binomial distribution B(n, p)?
np
For B(6, ½), what is the mean?
3
What shape does a normal distribution curve have?
Bell-shaped
What does a larger standard deviation result in for a normal distribution curve?
A lower curve
What is the notation for a normal distribution?
N(μ, σ²)
What do μ and σ² represent in the normal distribution notation?
μ = mean, σ² = variance
What are the conditions for normal distribution?
- Data is continuous
- Distribution is symmetrical
- Mode, median, and mean are approximately equal
Approximately what percentage of data values lie within 1 standard deviation of the mean?
68%
Approximately what percentage of data values lie within 2 standard deviations of the mean?
95%
Approximately what percentage of data values lie within 3 standard deviations of the mean?
99.8%
For a data set with mean=30 and SD=3, what is the range for 68% of the sample?
27-33
How do you sketch a normal distribution?
Draw a bell-shaped curve centered on the mean and ending at 3 SD from the mean
What is the formula to calculate the number of SDs from the mean?
(value - mean) / standard deviation
What is the purpose of standardised scores?
To compare how far above or below the average individual values are
What does a positive standardised score indicate?
The value is above the mean
What does a negative standardised score indicate?
The value is below the mean
What does a standardised score of zero indicate?
The value is equal to the mean
What does quality assurance involve?
Checking samples to ensure products are of the same quality and standard
What is a control chart?
A time series chart used for quality assurance
What are the lines on a control chart?
- Target Value (middle line)
- Upper and Lower Warning Lines (inner 2 lines)
- Upper and Lower Action Limits (outer 2 lines)
What happens if a sample average/range is above/below the warning line?
Another sample is taken and checked for problems
What happens if a sample average/range is outside the action limits?
Production is stopped immediately and machinery is reset