Week 3 Flashcards

1
Q

What does descriptive statistics focus on?

A

Descriptive statistics focuses directly on summarising
and presenting data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does portray the data?

A

Tables and data visualisation (e.g. graphs) portray the
data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some examples of charts?

A

Examples of charts include histograms, scatter plots,
and line graphs, among many more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are descriptive measures used for?

A

Descriptive measures are used to summarise data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is descriptive statistics commonly divided into?

A

Descriptive statistics is commonly divided into
measures of central tendency and measures of
variability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do measures of central tendency focus on?

A
  • Measures of central tendency focus on the average or
    middle values.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do measures of variability focus on?

A

Measures of variability focus on the dispersion of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are examples of measures of central tendency?

A

Measures of central tendency describe the center
position of a distribution for a dataset.
* Examples of measures of central tendency include the
mean, median, and mode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Do measures of central tendency give a whole picture of the dataset?

A

Measures of central tendency give only a partial
picture of a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do measures of variability aid in?

A

Measures of variability (or the measures of spread) aid
in analysing how dispersed is a data distribution.
* For example, while the mean of the data maybe 65 out
of 100, there can still be data points at both 1 and 100.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are examples of measures of variability?

A

Measures of variability help
communicating this by
describing the shape and
spread of the dataset.
* Variance, range, and quartiles
are examples of measures of
variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are variables?

A

Variables are factors that can take on more than one
value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do we generally calculate a descriptive statistic?

A

Generally, we calculate a descriptive statistic by summarising
that variable’s distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does distribution refer to?

A

Distribution refers to the different values that can be assumed,
and their frequency (i.e. how often each value occur).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do we care about for discrete data?

A

For discrete data, we care especially about:
– Commonly occurring values (e.g. mode).
– Unusual values (e.g. outliers).
– Patterns in between.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is univariate descriptive statistics?

A

Only describing one variable at a time, so it is called
univariate descriptive statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are some types of variables?

A

Types of variables: Discrete, Binary, Categorical, Continuous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are some good practices with graphs and tables?

A
  • Graphs and tables should be able to stand on their
    own.
  • Titles should clearly explain what the graph or table
    is about.
  • Notes aim to inform the reader about data source,
    sample size.
  • Notes can also be used to explain abbreviations,
    symbols, or mention further details.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the one way table?

A

The one-way table
is the tabular
equivalent of a bar
chart.
* It displays
categorical data in
the form of
frequency counts
and/or relative
frequencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are distributional features of interest?

A

Distributional features of
interest:
– Commonly occurring
values.
– Unusual values.
– Patterns in between.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are good practices with tables?

A

Clearly identify the variables included in the rows and
columns.
* Variable names must be meaningful.
* Order the rows to aid interpretation:
– Relevant values/ information at the top.
* We should be able to convert a percentage table back into
numbers and vice-versa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What should tables visually do?

A

Visually, tables should:
* Avoid vertical lines.
* A minimum of horizontal lines to clarify meaning.
* Separating headings from the remaining data.
* Separating table contents from the title and notes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are pie charts good at?

A

Good at presenting data when:
– Discrete (and mutually exclusive) variable.
– Small number of categories (not too much more than 6).
– Real variation across categories.
– Exhaustive: total adds up to 100%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Why are pie charts under attack?

A

Pie chart options under attack these days:
– Hard to compare angles (as opposed to lengths in bar charts).
– Require a legend and a mix of colours – distracting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are bar charts good at?

A

Display some measure for discrete categories.
* Enables direct comparison:
– Bar height proportional to the size of the category they
represent.
* No scale for x-axis because it is the category name.
* Space between bars: Categories are discrete.
* Summing across all bars should equal 100%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are graphs and tables powerful at?

A
  • Graphs and tables are a powerful way of communicating
    complex information clearly and accurately.
27
Q

What is a frequency distribution?

A

A frequency distribution is a tabular summary of a
dataset showing the frequency (or number) of items in
each of several nonoverlapping classes.

28
Q

What is the objective of frequency distribution?

A

The objective is to provide insights about the data that
cannot be quickly obtained by inspecting the original raw
data.

29
Q

What is frequency distribution for qualitative data?

A

For qualitative data, it means simply counting the
number of times each value occurs.

30
Q

What is the frequency distribution for quantitative data?

A
  • For quantitative data, this means either counting values
    (discrete data) or grouping the values (continuous data).
31
Q

What is symmetric frequency distribution?

A

Symmetric:
In the case that a distribution is split into two identical
halves

32
Q

What is skewness frequency distribution?

A

Skewness:
Level of asymmetry in which an elongated tail extends in
either the right-hand direction (i.e. positive skewness) or
the left-hand direction (i.e. negative skewness).

33
Q

What is kurtosis frequency distribution?

A

Kurtosis:
Degree of peakedness or steepness in a distribution.

34
Q

What is the effect of distribution shape on descriptive measures?

A

The shape of the distribution has an influence on
virtually all the statistical descriptive measures.
* When a distribution is perfectly symmetrical, the mean
and the median values are the same.
* When a distribution is skewed, this equivalence
disappears.
In that case, the median tends to be more representative
of the dataset than the mean.

35
Q

What is the mean?

A

The mean is also known as the average, expected value,
expectation, mathematical expectation, or first moment.
* Expected value is a key concept in economics, finance,
and many other subjects.
* There are several types of means in statistics.
* Three widely used means are the geometric mean (GM),
harmonic mean (HM), and arithmetic mean (AM).

36
Q

What is the arithmetic mean?

A

Arithmetic Mean: it is the sum of all of the numbers
divided by the number of numbers.
* Similarly, the mean of a sample 𝑥1, 𝑥2,…, 𝑥𝑛, usually
denoted by 𝑥ҧ, is the sum of the sampled values divided by
the number of items in the sample:

37
Q

What is the median?

A

The median is the value that separates a set of values into
two perfectly equal halves.
* The median is the middle value in an ordered list of the
data.
* In the case a dataset contains extreme values, the median
is generally the preferred measure of central location.
* If there is an odd number of items, the median is the
value of the middle item.

38
Q

What is the mode?

A

The mode is the element that occurs most often in a
dataset.
* A dataset may have zero, one (unimodal), two (bimodal),
or more (multimodal) modes.
* For example, the mode of the sample [1, 3, 6, 6, 6, 6, 7, 7,
12, 12, 17] is 6.

39
Q

What is variance?

A
  • Variance is a measure of how far a set of numbers is
    spread out from their average value.
  • Variance has a central role in statistics, where some ideas
    that use it include descriptive statistics, statistical
    inference, hypothesis testing, among others.
  • A disadvantage of the variance for practical applications
    is that its units differ from the random variable.
40
Q

What is the variance an average of?

A

The variance is the average of the squared differences
between each data value and the mean.

41
Q

What is standard deviation?

A

The standard deviation is a measure of the amount of
variation or dispersion of the values in a dataset.
* The standard deviation is approximately the average
distance between all individual values in a dataset and its
centre (i.e. the mean).
* A low standard deviation indicates that the values tend to
be close to the mean (also called the expected value) of a
dataset.

42
Q

What does a high standard deviation indicate?

A

On the other hand, a high standard deviation indicates
that the values are spread out over a wider range.

43
Q

What is the standard deviation of a random variable?

A

The standard deviation of a random variable, sample,
statistical population, dataset, or probability distribution
is the square root of its variance.

44
Q

What is the range?

A
  • The range is the difference between the smallest and the
    largest values in the dataset, as follows:
    Range = 𝑥max − 𝑥min
  • It is the simplest measure of variability.
  • It is very sensitive to the smallest and largest data values.
45
Q

What is the mean absolute deviation?

A

The mean absolute deviation (MAD) measures the
average absolute distance (or deviation) of values in a
dataset from the dataset mean.

46
Q

What are percentiles?

A

If the value A is the pth percentile value for a dataset, then
at least p% of the values are less than or equal to A and at
least (1 − 𝑝)% of the values are greater than or equal to
A.
* In order to determine the pth percentile we need to:
* Arrange the data in ascending order.

47
Q

What are quartiles?

A

Quartiles are specific percentiles dividing the data into
four parts.
* The first (lower) quartile corresponds to the 25th
percentile (Q1).
* The second quartile corresponds to the 50th percentile,
which is also the median (Q2).
* The third (upper) quartile corresponds to the 75th
percentile (Q3).

48
Q

What is the interquartile range?

A

The interquartile range of a dataset is the difference
between the third and the first quartile, calculated as
follows:
IQR = Q3 − Q1
* It is the range for the middle 50% of the data.
* It overcomes the sensitivity to extreme data values.

49
Q

What is a distribution?

A

A distribution is simply a collection of data or scores (e.g.
z-score, t score) of a variable.
* The values of a distribution are commonly ordered (e.g.
from smallest to largest).
* Distributions are commonly depicted using data
visualisation tools (e.g. charts).

50
Q

What does represent probability in a continuous probability distribution?

A

In a continuous probability distribution, it is the area, instead of
height, that represents probability.

51
Q

What is the normal distribution?

A

The Normal (also know and Gaussian) probability
distribution is the most popular distribution for
describing a continuous random variable in statistics.
* It plays a crucial role in the theory of sampling and is
widely used in statistical inference.
* Many natural phenomena have patterns that resemble
the normal distribution (e.g. body weight, shoe size, IQ,
etc).

52
Q

What assumption is the assumption of normality based on?

A

Many statistics are based on the assumption of normality.
* In terms of parameters, the Normal distribution contains
a mean 𝜇 and a variance 𝜎
2
(and, consequently, standard
deviation 𝜎), determining the centre and width of the
distribution, respectively.
* The highest point on the Normal curve is at the mean,
which is also the median and the mode.

53
Q

What does the shape of normal distribution resemble?

A

The shape of the Normal distribution resembles a shape
of a bell (i.e. bell-shaped curve).

54
Q

What is the central limit theorem?

A

This is one of the most important theorems in statistics.
* A sample size of 𝑛 ≥ 30 is considered large.
* Whenever the population has a Normal distribution, the
sampling distribution of the sample mean has a normal
distribution for any sample size.
As sample size increases, the sampling distribution of the
sample mean rapidly approaches the bell shape of a normal
distribution, regardless of the shape of the population.
In small sample cases (𝑛 < 30), the sampling distribution of
sample mean will be normal so long as the population is
normal.

55
Q

What is a standard normal distribution?

A
  • A random variable that has a normal distribution with a
    mean of zero and a standard deviation of one is said to
    have a standard normal probability distribution.
  • The letter z is commonly used to designate the standard
    normal random variable. More specifically, a z-score.
56
Q

What is the law of large numbers?

A
  • The law of large numbers (LLN) consists of a theorem that
    describes the result of performing the same experiment a large
    number of times.
  • According to the LLN, the mean of the results obtained from a
    large number of trials should be close to the expected value
    and tends towards the expected value as more trials are
    performed.
  • The LLN is relevant due to the fact that it guarantees stable
    long-term results for the averages of some random events.
57
Q

What is hypothesis testing?

A

*In hypothesis testing, a statement – call it a hypothesis – is
made about some characteristic of a particular population.
*A sample is then taken in an effort to establish whether or
not the statement is true.
*If the sample produces results that would be highly
unlikely under an assumption that the statement is true,
then we’ll conclude that the statement is false.
▪ The null hypothesis H0
is the statement to be tested.
▪ The alternative hypothesis Ha is the opposite of what is
stated in the null hypothesis.

58
Q

How to establish the hypothesis?

A

*The status quo (no change) position serves as the null
hypothesis.
*Compelling sample evidence to the contrary would have
to be produced before we’d conclude that a change in
prevailing conditions has occurred.
*This approach usually involves a decision that needs to
be made if the null hypothesis is rejected.
*Example:
*H0
: The machine continues to function properly.
*Ha
: The machine is not functioning properly

59
Q

What does it mean by accepting or rejecting a hypothesis?

A

Failing to reject a null hypothesis shouldn’t be taken to
mean that we necessarily agree that the claim is true.
*We’re simply concluding that there’s not enough sample
evidence to convince us that it’s false.
*It’s for this reason we’ve chosen to use the phrase “fail to
reject” rather than “accept” the claim.

60
Q

What is the p-value?

A

The p-value measures the probability that, if the null
hypothesis is true (as an equality), we would randomly
produce a sample result at least as unlikely as the sample
result that we actually produced
p-value Decision Rule
If the p-value is less than a, reject the null hypothesis

61
Q

What is the possibility of hypothesis error?

A

*Whenever we make a judgment about a population
parameter based on sample information, there’s a chance
we could be wrong.
*In hypothesis testing, in fact, we can identify two types of
potential errors.
a, the significance level, measures the maximum
probability of making a Type I error.

62
Q

What is bootstrapping?

A

Bootstrapping is a statistical procedure that resamples a
single dataset to create many simulated samples.
* This process allows to calculate standard errors, build
confidence intervals, and perform hypothesis testing.
* Both bootstrapping and traditional methods use samples
to draw inferences about populations.

63
Q
A