Section A.1: Introduction to Statistics Flashcards

1
Q

What is data?

A

Data are measurements or observations collected as a source of information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is statistics?
(A, I, P, O)

A

Statistic is a branch of mathematics which deals with the
- Analysis
- Interpretation
- Presentation
- Organisation
of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two types of statistical analysis?

A

Descriptive analysis, and inferential analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Descriptive statistics

A

Descriptive statistics refers to the statistical measures used to summarise and describe the basic properties of data, it is used in univariate, bivariate, and multivariate analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Types of statistical measures in descriptive statistics (2)

A

Measures of Central Tendency
Measures of Dispersion/Variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Inferential statistics

A

Inferential statistics refers to the processes that make inferences about a population based on a representative sample of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define statistical analysis

A

Statistical analysis refers to the analysis of data in order to gather insights, and discover underlying patterns and trends.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define: Population Distribution

A

Population Distribution refers to the spread of a characteristic or variable across an entire population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is meant by sample?

A

Sample refers to a subset of a population that is selected to represent the entire population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Categorical Variable

A

A categorical variable is a type of variable which is character or feature-based. Any variable which is not numerical. It is qualitative (characterised by qualities or features) and descriptive in nature.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Categorical variable data types (3)

A

Nominal - which has no order or ranking
Binary data - Data which has exactly two possible values (eg. True, False)
Ordinal - has a natural order or ranking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Nominal data does not have a natural order or ranking (T/F)

A

True, it does not have a natural order or ranking.
Eg. Gender, eye colour.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Ordinal data

A

Ordinal data is data that has a natural order or ranking.
Eg. Education level, size (small, medium, large)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Binary Data

A

A type of nominal data which has exactly two possible values
Eg. A or B, Yes or No.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Numerical variable - data types (2)

A

Discrete and Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Discrete data

A

Data that can only take on certain values, usually whole numbers, NOT like continuous data.
Eg. Number of siblings

17
Q

Continuous data

A

Continuous data is data that can take on ANY value in a range, NOT like discrete data.
Eg. Height, temperature

18
Q

Data Dimension

A

Data dimension refers to the number of variables or features in a dataset.

“A dataset can have one, two, or more dimensions depending on the number of variables being measured.”

19
Q

Data Dimension types (3)

A

Univariate or one-dimensional: One variable (age)
Bivariate or two-dimensional: Two variables (age vs. height)
Multivariate (three-dimensional or more): 3 or more variables (Age vs. height vs. weight)

20
Q

Variable

A

A variable is a characteristic or attribute that can take on different values for different observations

21
Q

Independent variable

A

An independent variable is a variable that is manipulated or controlled by the researcher in a study to measure its effect on the dependent variable/s. It is plotted on the x-axis.

22
Q

Dependent variable

A

A dependent variable is a variable which response is being measured or predicted in a study. Plotted on the y-axis.

23
Q

What are bar graphs used for?

A

Bar graphs are used to visualise and compare groups of categorical data, the frequency of observations, distribution of observations, and/or compare magnitudes among them.

24
Q

What are bar graphs not used for? (2)

A

Numerical data (that is a histogram), and continuous or categories that are not independent of each other.

25
Q

What is a scatter plot used/useful for?

A

Scatter plots are useful for visualising and comparing patterns, trends, and relationships in numerical continuous data, and identifying outliers.

26
Q

What is a line graph useful for?

A

Line graphs are useful for visualising and comparing patterns, trends, and relationships, among continuous numerical variables, over time.

27
Q

What is a histogram used for?

A

A histogram is used to visualise grouped continuous data to compare the ranges of values and see the shape of distribution among those groups.

28
Q

What is a pie chart used for?

A

A pie chart is used to visualise categorical univariate data. They show the proportion of observations in different categories/groups in relation to a whole.

29
Q

Data Exploration

A

Data exploration is the initial analysis of data to understand its basic properties, patterns, and relationships

30
Q

Techniques for visualising univariate categorical data (3)

A

Frequency table, Bar chart, Pie chart

31
Q

Predictive modelling

A

Predictive modelling refers to tehniques and processes used to generate a predictive model which can predict a value of a dependent variable when only the predictor/independent variable/s value is known.

32
Q

Standard deviation calculation steps

A
  1. Find the variance of all values in the data set
  2. Square all of these values
  3. Get the sum of those values
  4. Divide by number of values in the data set
  5. Get the square root of that value
33
Q

Techniques for visualising and analysing numerical univariate data (3)

A

Descriptive statistics, box plots, histograms