Class 2 - Variables, Descriptive Stats Flashcards

1
Q

Define “Sampling Error”

A

A Sampling Error refers to the discrepancy or difference between a sample statistic and the true population parameter it represents. Sampling error arises because researchers typically collect data from a sample rather than an entire population, and the characteristics of the sample may not perfectly reflect those of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define “Measurement Error”

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is “Sampling Bias?”

A

Sampling Bias refers to a systematic error that occurs when the sample selected for a study is not representative of the larger population from which it was drawn.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is “Measurement Error?” or “Confound”?

A

Measurement Error refers to the discrepancy between the true value of a variable and the value that is obtained through measurement or observation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is “Internal Validity”?

A

Internal Validity refers to the extent to which a study accurately measures or tests the relationship between variables without being influenced by extraneous factors or confounding variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is “External Validity?”

A

External Validity refers to the extent to which the findings of a study can be generalized or applied to populations, settings, or conditions beyond the specific context in which the study was conducted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a “Confidence Interval”?

A

A Confidence Interval is a statistical measure that quantifies the uncertainty surrounding an estimate or parameter calculated from sample data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a “Continuous Variable”?

A

A Continuous Variable is one that can take on any value within a certain range or interval. They are measured on a Continuous scale, and can theoretically have an infinite number of possible values. Examples of Continuous Variables include: Age, Weight, Height, reaction time, scores on psychological tests. Continuous Variables are typically analyzed using Descriptive Statistics(Mean, Standard deviation)) and Inferential Statistics (t-tests, regression analysis, ANOVA).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an “Ordinal Variable”?

A

An Ordinal Variable is one that represents ordered categories or ranks. They have both discrete categories like Categorical Variables, and a natural ordering or hierarchy among the categories. Examples of Ordinal Variables: Likert scale responses, educational attainment, and socioeconomic status categories. Ordinal variables can be analyzed using similar techniques as categorical variables, but additional methods that account for the ordinal nature of the data, such as nonparametric tests and ordinal regression, may also be appropriate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a “Categorical Variable?”

A

A categorical Variable is one that represents categories or groups with distinct labels or names. These variables have a finite number of discrete categories, and there is no inherent order or ranking among the categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are “Summary Statistics”?

A

Summary Statistics refer to numerical measures that provide a concise summary or overview of the characteristics of a dataset. Often used to describe the Central Tendency, Variability, and distribution of the data, and they help researchers understand and interpret the patterns and relationships within the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a “Descriptive Statistic”?

A

A descriptive Statistic is a numerical measure that summarizes and describes the characteristics of a dataset. Descriptive Statistics are used to organize, summarize, and present data in meaningful and understandable way, providing researchers with insights into the patterns, trends, and distributions within the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is “Central Tendency”?

A

Central Tendency refers to a statistical measure that represents the typical or central value of a dataset. Measures of central tendency provide insights into the central or average value around which the data values tend to cluster. These measures help researchers understand the central focus or tendency of the data distribution. Three main measures of central tendency are (Mean, Median, Mode)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the “Probable Sample Effect”?

A

The probable sample effect refers to the potential impact of random sampling variability on the observed results or findings of a study. This effect arises because researchers typically collect data from a sample rather than the entire population, and the characteristics of the sample may not perfectly reflect those of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an “Interquartile Range”?

A

The interquartile range (IQR) is a measure of statistical dispersion that quantifies the spread or variability of a dataset. The IQR provides information about the range of values withing which the first 50% of the data fall. It is calculated as the difference between the third quartile (Q3) and the first quartile (Q1) of the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a “Standard Deviation”?

A

The standard deviation is the measure of the dispersion or variability of a dataset. It quantifies the average distance of data values from the mean, providing insights into the spread of data around the central tendency. The standard deviation is calculated by: determining the mean of the dataset (by summing all the data values and dividing by the total number of values.), determining the deviation of each data value from the mean (by subtracting the mean from each value), Squaring each deviation of each data value from the mean (by subtracting the mean from each value), calculating the variance (by averaging the squared deviations), and then finally calculating the standard deviation (as the square root of the variance.) The standard deviation is used widely in behavioural research for several purposes such as: Describing variability, assessing data quality, comparing groups or conditions, and inferential statistics.

17
Q

What is a “Coefficient of Variation”?(CV)

A

The coefficient of variation (CV) is a statistical measure that quantifies the relative variability of a dataset compared to its mean. It is calculated as the ratio of the standard deviation to the mean, expressed as a percentage. The coefficient of variation provides standardized measure of variability that allows researchers to compare the dispersion of data across different datasets, regardless of their scale or units of measurement. A low coefficient of variation indicates that the data values are relatively close to the mean, suggesting that the dataset has low variability or dispersion. A high coefficient of variation, however, indicates that the data values are more spread out from the mean, suggesting that the dataset has high variability, or dispersion. It is particularly useful for comparing variability of differnt variables or groups within a study, as well as for comparing variability across studies or populations. Additionally, the coefficient of variation can be used to assess the stability or consistency of measurements and to identify outliers or extreme values within a dataset.

18
Q

What are “Measures of Variance”?

A

A measure of variance is a statistical metric that that quantifies the spread or dispersion of data values around a central tendency, such as the mean or median. Variance measures provide insights into the extent to which individual data points deviate from the central value, offering researchers valuable information about the variability within a dataset. Common measures of variance include: Variance, Standard Deviation, Interquartile range (IQR), and range. Measures of variance are essential in behavioral research for summarizing the variability of data, identifying patterns or trends, and making comparisons between groups or conditions.

19
Q

What is a “Mean”?

A

The mean is a measure of central tendency that represents the arithmetic average of a set of data values. It is one of the most commonly used descriptive stats and provides a summary of the typical value or central tendency of the data distribution. The mean is calculated by summing all the data values in the dataset, and then dividing by the total number of values. The mean provides a “balance point” or “center of gravity” for the dataset, as it represents the average value around which the data values tend to cluster. It is influenced by the magnitude of each data value and is sensitive to extreme values or outliers in the dataset.

20
Q

What is a “Mode”?

A

The mode is a measure of central tendency that represents the most frequently occuring values or values in a dataset. Unlike the mean and median, which represent the average and middle values of a dataset, respectively, the mode identifies the values that occur with the highest frequency. the Mode is used in behavioural research for for several purposes, such as: Describing Central Tendency, Identifying patterns, categorical data, and skewed distributions(more reliable measure of “typical values” than mean/median).

21
Q

What is a “Median”?

A

The median is a measure of central tendency that represents the middle value of a dataset, when the data values are arranged in ascending or descending order. Unlike the mean, which is the arithmetic average of all data values, the median is not influenced by extreme values or outliers in the dataset. The median divides the dataset into two equal halves, with 50% of the data values falling below the median and 50% falling above it. To calculate the median: Arrange the data values in ascending order. If the dataset has an odd number of values, the median is the middle value. If the dataset has an even number of values, the median is the average of the two middle values.

22
Q

What is meant by the “Range” of a variable?

A

The range of a variable refers to the difference between the maximum and minimum values observed in a dataset. It provides a simple measure of the spread or dispersion of data values within a variable. Mathematically, the range is calculated as: Range = Maximum value - Minimum value. It is particularly useful for describing the spread of numerical data and understanding the extent of variability or diversity among data values.

23
Q

What is the “Binomial Probability Example”?

A

A Binomial Probability Example refers to a situation where there are only two possible outcomes for each trial, typically labeled as success and failure. The binomial distribution describes the probability of getting a certain number of successes in a fixed number of independent trials. A researcher could use a binomial probability calculation to determine the likelihood of observing a specific number of participants who show improvement in the treatment group, given a certain probability of success. Using the binomial probability formula, the researcher can calculate the probability of observing a certain number of successes (participants showing improvement) out of the total number of trials (participants in the treatment group). This calculation can help the researcher assess the likelihood of observing different outcomes and make statistical inferences about the effectiveness of a therapy.

24
Q

What is a “p value”?

A

A p-value is a statistical measure that quantifies the strength of evidence against a null hypothesis. It represents the probability of observing the observed data or more extreme results, assuming that the null hypothesis is true. It quantifies the strength of evidence against the null hypothesis. A small p-value indicates strong evidence against the null hypothesis, suggesting that the observed results are unlikely to occur is the null hypothesis is true. Conversely, a large p-value indicates weak evidence against the null hypothesis. The p-value provides standardized measure for evaluating the strength of evidence against the null hypothesis. It help researchers make informed decisions about the validity of their findings and draw conclusions based on statistical inference. However, its important to interpret p-values in the context of the study design, the significance level chose, and other relevant factors to ensure appropriate interpretation and decision-making.

25
Q

What is “NHST”?

A

NHST stands for Null Hypothesis Significance Testing. It is a widely used statistical method in behavioural research for making inference about population parameters based on sample data. NHST consists of 5 steps: 1) Formation of null hypothesis,2) Formation of alternative hypothesis, 3) Calculation of test statistic from the sample data, which is used to assess the likelihood of observing the data under the null hypothesis. 4) calculation of the p-value, which represents the probability of obtaining the observed data or more extreme results, assuming that the null hypothesis is true. 5) Formation of the significance level, (denoted as α) to determine whether the p-value is considered statistically significant.

26
Q

What is “CI”? (Confidence Interval)

A

A confidence interval is a statistical range or interval that is calculated from sample data and is used to estimate the range of values, within each true population parameter is likely to fall, with a certain level of confidence.

27
Q

What is “Slope”?

A
28
Q

What is the “Five Number Summary”?

A

Minimum, First Quartile, Median, Third Quartile, Maximum

29
Q
A
30
Q
A