Introduction to statistics Flashcards
What is a statistical unit?
A statistical unit is something you get a measurment from, eg. If you are doing a study on the height of 200 people, then each person you get a height value for is a statistical unit
Why is it important to know what a statistical unit is?
- When doing studies, our analysis are created on statistical units and the number of observations
How would you create a statistical unit for a study?
- Create a list of all possible measurements to be taken for the study
What are some examples of measurements from statistical units?
If the statistical units are people:
- Age (years)
- Gender
- Height (cm)
- Time to get to uni (minutes)
If a scientist measured the height, age and weight of 200 statistical units, what would these measurements be called collectively?
The height, age and weight of the statistical units would be known as the observations about a variable.
What is the difference between a variable and a variate?
A variable describes the measurement where as a variate is the actual value of a measurement.
Mark is 20 years old, he is 182 cm tall and he takes 23 minutes to get to uni. What is the statistical unit and what are the observations?
- Mark is the statistical unit
- His age, height and the time it takes him to get to uni are the observations.
What is the reason we collect data?
Data is collected to prove an idea or a theory and to make comparisons.
What is statistics ?
It is a science that is concerned with the processing, analysis and description of data.
Collecting, presenting and transforming data to assist decision-makers
What is a population in statistics?
A population consists of ALL the members of a group about which you want to draw a conclusion
What is a sample?
A sample is the portion or subset of the population selected for analysis
What is a parameter in statistics?
- A parameter is a numerical measure that describes a characteristic of a population
- A summary value that represents some feature of a population.
- The value of a parameter is constant for the population and has no error associated with it.
What is a statistic?
- A statistic is a numerical measure that describes a characteristic of a sample
- If the population is sampled, parameters must be estimated by a corresponding statistic.
- Statistics are associated with a certain level of uncertainty or error.
Give an Example of a parameter.
The population mean student grade for Biology in 2002
Give an example of a statistic.
The average student grade for the sample of students in Biology in 2002.
What is a variable?
- The property of, or observation made on, a statistical unit (SU) within a population (or sample).
- The value of this property will vary from SU to SU.
Give an example of a variable.
- The student’s grade for Biology in 2002.
What is a variate?
A particular value of a variable
Give an example of a variate.
- Student five had a grade of 75% for Biology in 2002.
What are the two branches of statistics?
- Descriptive Statistics
- Inferential statistics
What is Descriptive statistics?
- A descriptive statistic is a summary statistic that quantitatively describes or summarises features of a collection of information
What types of data are used in Descriptive statistics?
- Collective data (Surveys)
- Present data (Tables and Graphs)
- Characterise data (Sample mean)
What is Inferential statistics?
Drawing conclusions about a population based on sample data i.e. estimating a parameter based on a statistic
What data and/or techniques are used in Inferential statistics?
- Estimation
- Hypothesis testing
Give an example of Estimation
Estimate the population mean weight (parameter) using the sample mean weight (statistic)
Give an example of Hypothesis testing
Test the claim that the population mean weight is 100 kilos
What is the general process of statistics?
Theory–>Question to answer it/Hypothesis to test –>Design Research Study –>Collect Data (measurements, observations) –>Organise and make sense of the numbers using either descriptive or inferential statistics
What are some important sources of data collection?
- Data distributed by organisation or individual
- Designed experiment
- Survey
- Observational study
What are the sources that data can be classified as?
- Primary sources – Experimental Design, Conduction of a survey or an observational study
- Secondary sources: Mostly government or industrial, but also individual sources
What are the two types of data?
- Categorical (Qualitative)
- Numerical (Quantitative)
What is Numerical data
Numerical data is measured on a natural numerical scale for example:
- Age
- Weight
- Temp
What is Categorical data?
Categorical data can only be named or categorised, for example:
- Gender
- Satisfaction with a meal (very good, good, average, …)
- Level of education (primary, secondary, tertiary)
What are the two types of Categorical data?
- Ordinal
- Nominal
What does “Bias” mean in statistics?
The difference between the sample value and the true population value
What is nominal data?
- A type of Categorical data
- No natural or implied order
- Example - Mode of transport to uni today - car, bus, etc
- No response is considered better
What is Ordinal data?
- A type of Categorical data
- There is an implied order
- Example – Rating a meal –>Good, very good, etc
- Definite order
What are the two types of numerical data?
- Continuous
- Discrete
What is continuous data?
- A type of numerical data
- Data that can take on any real number
- Measured characteristics (infinite number of items)
- Example – Time to travel to work –>53.234 minutes
What is Discrete data?
- A type of Numerical data
- Countable number of responses (finite number of items)
- Tends to be integer value (0, 1, 2, 3, …., 999)
- Example number of students in today’s lecture (Can you have half a person?)
What is the independent variable?
- The variable(s) considered to be the cause of the change in the dependant variable.
- Implies cause and effect.
- Aka the explanatory variable(s).
How do you know what data is the independent variable?
- In regression, it is the variable (continuous or quantitative) used to predict the dependent variable.
- In experimental designs (ANOVA) it is the variable (ordinal or nominal) that is manipulated in the experiment.
What is the dependant variable?
- The variable assumed to be influenced (or affected) by the independent variable.
- Implies cause and effect.
- Also called the variable of interest.
Give an example of a dependant variable?
ANOVA - In an experiment to monitor growth rates in plants (dependent variable), fertiliser rates (independent variable) were manipulated.
What are Extraneous variables?
All the other variables that are not of direct interest but which may have some impact.
What is a Controlled Variable?
- An extraneous variable that has been manipulated so that the impact is eliminated and any effect will be constant for each Statistical Unit (case or observation).
What are Irrelevant Variables?
Variables having no effect on the study.
What are Confounded Variables?
- The researcher does not control these variables. Usually not known to the researcher and may be having a differential effect on the values of variable of interest.
How do Confounded Variables affect the study?
- These variables are confounded or interrelated with the variables under study. This may distort the studies findings.
What are Lurking Variables?
- Variable A seems to vary with variable B but this relationship is due to variable C affecting both A and B.
What is the first step of Data Analysis?
- Integrate the statistics into the process of scientific investigation.
What is the second step of Data Analysis?
- Statistical tests should be considered very early in the process and not left until the end.
What is the third step of Data Analysis?
- Decide what the question is – what is the variable of interest, what is the research question.
What is the fourth step of Data Analysis?
- Formulate a question (or hypotheses).
What is the fifth step of Data Analysis?
- Design the experiment (or sampling routine) so that you can test the hypotheses.
What is the sixth step of Data Analysis?
- A pilot study or the collection of ‘dummy’ data can aid in your investigations.
What is the seventh step of Data Analysis?
- Use the key, that we will develop throughout this course and ‘Applied Statistics’ to select an appropriate test.
What is the eighth step of Data Analysis?
- Carry out the test using the ‘dummy’ or pilot data – input data, use SPSS and interpret the output.
What is the ninth step of Data Analysis?
- If there are no problems continue, or else redesign the experiment.
What is the tenth step of Data Analysis?
- Carry out the experiment, test the ‘real’ data and write up the results.
What are the eight things that you need to find when faced with a study or experiment?
- The variable of interest (Dependent)
- This variable’s type and scale
- The explanatory variable(s) (Independent)
- The type and scale of these variables
- The statistical unit
- The population
- The sample
- The aim of the experiment (research question)