Describing Data And Stats Tests Flashcards

1
Q

What is a complete population

A

complete population (often simply referred to as a “population”) encompasses the entire set of individuals or items that share one or more characteristics of interest. It is the total group about which information is sought or from which samples are drawn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a sample population

A

a sample population, often simply referred to as a “sample,” is a subset of individuals, items, or data points selected from a larger population for the purpose of conducting a study or analysis. The main goal of using a sample is to make inferences or generalizations about the entire population without having to study every individual within it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the different types of data

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the different skewness of data

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is numeric data *

A

Numeric data, also known as quantitative data, refers to data that can be expressed in numerical terms and can be measured or counted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is discrete data

A

Discrete data consists of distinct, separate values. These values are countable and often represent whole numbers.

Discrete data can only take specific values within a range and cannot be subdivided meaningfully (e.g., you can’t have 2.5 cars).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is continuous sata

A

Continuous data can take any value within a given range. These values are measurable and can take on an infinite number of values within a range.
Examples: Height, weight, temperature, time.
Characteristics: Continuous data can be divided into finer and finer increments, making it possible to represent it with decimals and fractions (e.g., 5.75 feet, 72.3 degrees).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is categorical data*

A

Categorical data refers to variables that can be divided into distinct groups or categories that do not have a natural order or ranking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is ranked data

A

Ranked data refers to a type of data in which the values have been ordered or ranked according to their magnitude or some other criterion.

Gender (Male, Female, Other)
Marital Status (Single, Married, Divorced, Widowed)
Types of Pet (Dog, Cat, Fish, Bird)
Eye Color (Blue, Green, Brown, Hazel)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is nominal data

A

Nominal data, also known as categorical or qualitative data, consists of categories that do not have a specific order or ranking. Each category is simply different from the others, and there is no inherent order or hierarchy among them.

Gender (Male, Female, Other)
Marital Status (Single, Married, Divorced, Widowed)
Types of Pet (Dog, Cat, Fish, Bird)
Eye Color (Blue, Green, Brown, Hazel)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the different types of modality*

A

In statistics, modality refers to the number of peaks or modes in the distribution of data. The mode is the value that appears most frequently in a dataset. Different types of modality describe the shape of the distribution based on the number of modes it has.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are the measures of central tendency*

A

Measures of central tendency are statistical metrics used to describe the center point or typical value of a dataset. They provide a summary of the dataset by identifying a central value around which the other data points are distributed. The three most common measures of central tendency are the mean, median, and mode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how is variance and standard deviation used to measure spread of data

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can range be used to measure spread

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

when should the different measures of spread be used

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how are descriptive statistics reported

A
17
Q

what is descriptive statistics

A
18
Q

what is inferential data

A
19
Q

what is hypothesis testing

A
20
Q

how can you be sure that your inference matches reality

A
21
Q

what is a null hypothesis vs an alternative hypothesis

A
22
Q

How can error types be reduced

A
23
Q

What is the t test used for

A
24
Q

What is the equation for the one sample t test

A
25
Q

what is the difference between accuracy and precision of the mean

A
26
Q

what is the equation for the two sample t test*

A

what is pooled SE

27
Q

what are degrees of freedom and significance values

A
28
Q

how can R be used to carry out one sample t test *

A

One-Sample t-test

If you want to compare the mean of a single group to a known value, you can use t.test() function with the argument mu specifying the population mean you’re testing against
E.g
# One-sample t-test example
# Assuming ‘data’ is your dataset
t.test(data, mu = 5) # Testing if the mean of ‘data’ is significantly different from 5

29
Q

what are the assumptions of the t test

A
30
Q

what is normality *

A

normality refers to the characteristic of a dataset or distribution that follows a normal distribution, also known as a Gaussian distribution or bell curve. A normal distribution is a symmetric, bell-shaped probability distribution characterized by certain properties:

31
Q

how can you check for normality

A
32
Q

what is the F-test used for

A
33
Q

what is equal variance *

A

Equal variance, also known as homoscedasticity, is a key assumption in many statistical analyses and refers to the condition where the variability of a variable is consistent across different levels of another variable. This means that the spread or dispersion of the residuals (errors) or responses is approximately the same across all levels of the independent variable(s).

34
Q

how do you carry out the Wilcoxon-rank sum test

A
35
Q

how do you write and present statistics

A
36
Q

how are figures presented

A
37
Q

When would you use mode, median, mean

A

Mean: Best used when the data is symmetric and there are no outliers. Commonly used in financial and economic data analysis.
Median: Ideal for skewed distributions or when there are outliers. Often used in real estate to describe housing prices.
Mode: Most useful for categorical data to identify the most common category. Frequently used in market research to find the most preferred product.

38
Q

how can R be used to carry out independent samples t test *

A

Independent samples t-test example

If you want to compare the means of two independent groups, you can use t.test() with the two group variables separated by a tilde ~.
E.g

# Assuming ‘group1’ and ‘group2’ are your two groups
t.test(group1, group2)

39
Q

how can you use R to carry out paired samples t test

A

If you want to compare the means of two related groups (e.g., before and after treatment), you can use t.test() with the two group variables specified and paired = TRUE.
E.g
# Paired samples t-test example
# Assuming ‘before’ and ‘after’ are your two related groups
t.test(before, after, paired = TRUE)