Introduction to Statistics Flashcards
What is ‘statistics’?
Statistics are the study of a representative sample to find out what happens in an entire population
When do we use statistics? What is the objective?
When we want to study a characteristic and we can only access a sample of the population, not everyone on the planet. The objective is to explain the variability/ to reduce uncertainty
What is a ‘variable’?
A variable is a characteristic that varies. Why this variable varies is what we study in statistics.
What is probability theory?
This is the theory underlying statistics. The idea we have an educated guess for the population based on our sample.
Why do we use descriptive statistics?
To familiarise ourselves with the data. Make sure you do not have errors, mistypes or typos in your data. Data cleaning.
What is a categorical variable?
Qualitative - words. Can be nominal or ordinal
What is a numeric variable?
Quantitative - numbers. Can be discrete or continous
What is an Ordinal variable and give an example
Words, but ranked. For example, income: poor, average, wealthy
What is a Nominal variable and give an example
Words, with no order or rank. For example, gender, eye colour, etc
What is a binary variable and what is an ordinal variable?
Binary is when there are only TWO choices, e.g. gender male or female - can be ordinal or nominal but must be only two categories.
An ordinal variable is when there are more than two choices. It can be nominal or ordinal data.
What is a discrete variable?
Distinct value. Whole numbers, not decimals or precise numbers. No ranking is required. For example, shoe size, Age 1yr/2yrs/3yrs etc
What is a continuous variable?
Specific numeric value which is on a scale (interval) and can be ranked. For example, weight, height, IQ, BMI, etc.
If we have categorical data, what kind of descriptive statistic should we use?
We are interested in the frequency, how many people are in each category. Hence, we need to create a frequency table. We can also create bar charts or a pie chart. This to visualise the data to see what we have.
In a frequency table, it will show us the ‘percentage’, the ‘valid percentage’ and the ‘culmuative perecentage’. What do each of these percentages mean and which one should we pay the most attention too?
The percentage, is the percentage including the missing values and outliers.
The valid percentage is the percentage removing the misisng values.
The culmuative percentage adds the percentages of each region from the top of the table to the bottom, culminating in 100%. This is more useful when the variable of analysis is ranked or ordinal, as it makes it easy to get a sense of what percentage of cases fall below each rank. (wikipedia)
If we use a frequency table and realise there is a missing value or a piece of incorrect data, what is our two options?
- remove and delete the missing or incorrect participant data (although loose information on all the other variables, so want to avoid) - Although remember to keep an original data document
- Fill in with the predicted value of what it is supposed to be