Statistics Flashcards
Descriptive Statistics?
Descriptive statistics is what we can say about a sample by observing the sample itself. This is somewhat limited and mostly consists of summarisations of the data, e.g. like aggregates on a column in a database table.
Inferential Statistics
Inferential statistics is what we can say about a population based on what we know about a sample. That means that we can infer (deduce or conclude from evidence rather than from explicit statements) about the population based on a smaller sample.
In statistics what is ‘Probability’?
Probability is what we can generally say about samples from a population.
So if we know 10 % of the population are left handed, we can expect 10 % of a sample randomly taken to be left handed.
In Probability Theory:
What does the experiment yield?
One possible outcome of a a sample space.
The sample space for tossing a coin is {head, tail}
In Probability Theory
What is a ‘Sample Space S’
A set of possible outcomes of an experiment.
The sample space for tossing a coin is {head, tail}
In Probability Theory
What is a ‘Event E’
An event is a possible outcome of an experiment, e.g. the event head when we toss a coin.
In Probability Theory
What is a ‘Probability of Outcome P(s)’
The probability of an outcome is always greater than 0 and less than 1, and the sum of the probability of all possible outcomes is 1, .
Descriptive Statistics
In Descriptive Statistics Which are the two different areas
Centrality and variability
Centrality: mean, median, mode
Descriptive Statistics
What is the Mean, or average and what kind of data is it most useful for?
The mean / average is the sum of a value divided with the number of values.
Most useful with homogeneous data - variables of one type. categorical or binary.
In Descriptive Statistics
What is the Median
What is the median in an evenly numbered data set?
The exact middle value of the data set.
If n is even, the median is the mean value of the two middle elements
In Descriptive Statistics
What is the Mode
The mode is the most frequent element.
1 , 1, 2, 3, 4 = mode = 1
Standard Deviation
Measure of the amount of variation on a set of values.
Low standard deviation indicates that the values are closer to the mean - the distribution is less wide
A high standard deviation indicates that the values are spread out on a wider range
In Descriptive Statistics
Is Standard Deviation describing variability or centrality
Variability : Dispersion of the data
Centrality: centrality measures determine the relative significance of a node in a social network
What is Correlation Analysis concerned with
Correlation analysis is concerned with relations between variables, e.g. if one goes up, what happens to the other?
What is a Correlation Coefficient
A correlation coefficient is statistic measure of the degree that one variable Y is a function of another variable X.
What does a correlation coefficient range between. and what do they mean
The correlation coefficient value ranges from -1 to 1, where 1 indicates perfect correlation, 0 indicates no correlation, and -1 indicates perfect negative correlation.
Does correlation imply causation?
No
Inferential Statistics
Used to infer about the population based on our knowledge about a sample.
Null-Hypothesis
In inferential statistics, the null hypothesis is a general statement or default position that there is no relationship between two measured phenomena
An example:
Hypothesis: drinking large amounts of alcohol makes you fall over.
Null-Hypothesis: people will fall over the same amount whether they drink alcohol or not.
What is the approach most often taken in regards to Null-Hypothesis and Hypothesis
We usually take the approach of rejecting the null-hypothesis; saying that the idea that there is no correlation is unlikely
- rather than confirming our hypothesis.
What does a 95% confidence interval mean?
The confidence interval is the limits within which a certain percentage (say 95% or 99%) of sample means will fall.
Given observations x1-xn and 95 % confidence level, there is 95 % probability of that the mean of a sample will fall in this interval
Significance
a result has statistical significance when it is very unlikely to have occurred given the null hypothesis.
In hypothesis testing, we can make Type 1 and Type 2 errors
what is a type 1 error
Falsely rejecting the null-hypothesis - false positive. “You are pregnant” when he is not
In hypothesis testing, we can make Type 1 and Type 2 errors
what is a type 2 error
Falsely accepting the null-hypothesis - false negative. “You are not pregnant” when she is
What is a dependent variable
A variable (most often denoted Y) whose value depends on that of another variable. In an experiment it is a variable that we are not trying to manipulate.
Independent Variable
A variable (often denoted X) whose variation does not depend on that of another variable. In an experiment it is the variable that we are trying to manipulate.
In a correlation study which of the two would you apply to parametric data
Pearson’s r
Spearman’s rho
Pearson’s r
In a correlation study which of the two would you apply to non-parametric data
Pearson’s r
Spearman’s rho
Spearman’s rho