Section A.1: Introduction to Statistics Flashcards
What is data?
Data are measurements or observations collected as a source of information
What is statistics?
(A, I, P, O)
Statistic is a branch of mathematics which deals with the
- Analysis
- Interpretation
- Presentation
- Organisation
of data.
What are the two types of statistical analysis?
Descriptive analysis, and inferential analysis
Descriptive statistics
Descriptive statistics refers to the statistical measures used to summarise and describe the basic properties of data, it is used in univariate, bivariate, and multivariate analysis.
Types of statistical measures in descriptive statistics (2)
Measures of Central Tendency
Measures of Dispersion/Variability
Inferential statistics
Inferential statistics refers to the processes that make inferences about a population based on a representative sample of the population.
Define statistical analysis
Statistical analysis refers to the analysis of data in order to gather insights, and discover underlying patterns and trends.
Define: Population Distribution
Population Distribution refers to the spread of a characteristic or variable across an entire population
What is meant by sample?
Sample refers to a subset of a population that is selected to represent the entire population
Categorical Variable
A categorical variable is a type of variable which is character or feature-based. Any variable which is not numerical. It is qualitative (characterised by qualities or features) and descriptive in nature.
Categorical variable data types (3)
Nominal - which has no order or ranking
Binary data - Data which has exactly two possible values (eg. True, False)
Ordinal - has a natural order or ranking
Nominal data does not have a natural order or ranking (T/F)
True, it does not have a natural order or ranking.
Eg. Gender, eye colour.
Ordinal data
Ordinal data is data that has a natural order or ranking.
Eg. Education level, size (small, medium, large)
Binary Data
A type of nominal data which has exactly two possible values
Eg. A or B, Yes or No.
Numerical variable - data types (2)
Discrete and Continuous
Discrete data
Data that can only take on certain values, usually whole numbers, NOT like continuous data.
Eg. Number of siblings
Continuous data
Continuous data is data that can take on ANY value in a range, NOT like discrete data.
Eg. Height, temperature
Data Dimension
Data dimension refers to the number of variables or features in a dataset.
“A dataset can have one, two, or more dimensions depending on the number of variables being measured.”
Data Dimension types (3)
Univariate or one-dimensional: One variable (age)
Bivariate or two-dimensional: Two variables (age vs. height)
Multivariate (three-dimensional or more): 3 or more variables (Age vs. height vs. weight)
Variable
A variable is a characteristic or attribute that can take on different values for different observations
Independent variable
An independent variable is a variable that is manipulated or controlled by the researcher in a study to measure its effect on the dependent variable/s. It is plotted on the x-axis.
Dependent variable
A dependent variable is a variable which response is being measured or predicted in a study. Plotted on the y-axis.
What are bar graphs used for?
Bar graphs are used to visualise and compare groups of categorical data, the frequency of observations, distribution of observations, and/or compare magnitudes among them.
What are bar graphs not used for? (2)
Numerical data (that is a histogram), and continuous or categories that are not independent of each other.
What is a scatter plot used/useful for?
Scatter plots are useful for visualising and comparing patterns, trends, and relationships in numerical continuous data, and identifying outliers.
What is a line graph useful for?
Line graphs are useful for visualising and comparing patterns, trends, and relationships, among continuous numerical variables, over time.
What is a histogram used for?
A histogram is used to visualise grouped continuous data to compare the ranges of values and see the shape of distribution among those groups.
What is a pie chart used for?
A pie chart is used to visualise categorical univariate data. They show the proportion of observations in different categories/groups in relation to a whole.
Data Exploration
Data exploration is the initial analysis of data to understand its basic properties, patterns, and relationships
Techniques for visualising univariate categorical data (3)
Frequency table, Bar chart, Pie chart
Predictive modelling
Predictive modelling refers to tehniques and processes used to generate a predictive model which can predict a value of a dependent variable when only the predictor/independent variable/s value is known.
Standard deviation calculation steps
- Find the variance of all values in the data set
- Square all of these values
- Get the sum of those values
- Divide by number of values in the data set
- Get the square root of that value
Techniques for visualising and analysing numerical univariate data (3)
Descriptive statistics, box plots, histograms