unit 1 - chapter 1 - sampling data Flashcards

1
Q

descriptive statistics (observed data)

A

Characteristics
Tables
Graphs
Measurements
Observed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

inferential statistics (unobserved data)

A

Statistical modeling
Hypothesis testing
Confidence intervals
Predictive analytics
Unobserved data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

statistic (number describing a sample)
count:
mean:
sd:
variance:
correlation coefficent:
error:

A

count/size: n
Mean: x bar
Standard deviation: S
Variance: s^2
Correlation coefficient: r
Error: e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

parameter (number describing a population)
count:
mean:
sd:
variance:
correlation coefficent:
error:

A

Count/size: N
Mean: Mew (u)
Standard deviation: Sigma (o)
Variance: sigma^2
Correlation coefficient: Rho (p)
Error: Epsilon (e)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

the goal of sampling

A
  1. The sample represents the population, so the statistic reflects its corresponding parameter
    –> Think of it like: x bar taking a pic of mew (stats/sample taking a picture of parameter/population)
    –> Our sample should reflect what our population is like
  2. To increase the likelihood the sample represents the population: use a systematic, random sampling method
    Examples: a sample survey
    Example: asking the first 5 people to take a survey
    Example: picking out names from a hat to survey someone randomly
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

on exam - various methods of sampling - simple random sample

A

Each method has pros and cons. The easiest method to describe is called a simple random sample.
Any group of n individuals is equally likely to be chosen as any other group of n individuals if the simple random sampling technique is used. In other words, each sample of the same size has an equal chance of being selected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

on exam - various methods of sampling - stratified sample

A

To choose a stratified sample, divide the population into groups called strata and then take a proportionate number from each stratum.

For example, you could stratify (group) your college population by department and then choose a proportionate simple random sample from each stratum (each department) to get a stratified random sample.

To choose a simple random sample from each department, number each member of the first department, number each member of the second department, and do the same for the remaining departments.

Then use simple random sampling to choose proportionate numbers from the first department and do the same for each of the remaining departments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

on exam - various methods of sampling - cluster sample

A

To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters.

All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your college population, the four departments make up the cluster sample.
Divide your college faculty by department.

The departments are the clusters. Number each department, and then choose four different numbers using simple random sampling. All members of the four departments with those numbers are the cluster sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

on exam - various methods of sampling - systematic sample

A

To choose a systematic sample, randomly select a starting point and take every nth piece of data from a listing of the population.

For example, suppose you have to do a phone survey. Your phone book contains 20,000 residence listings. You must choose 400 names for the sample

. Number the population 1–20,000 and then use a simple random sample to pick a number that represents the first name in the sample.

Then choose every fiftieth name thereafter until you have a total of 400 names (you might have to go back to the beginning of your phone list). Systematic sampling is frequently chosen because it is a simple method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

on exam - various methods of sampling - NON-RANDOM - convenience sample

A

A type of sampling that is non-random is convenience sampling.

Convenience sampling involves using results that are readily available. For example, a computer software store conducts a marketing study by interviewing potential customers who happen to be in the store browsing through the available software. The results of convenience sampling may be very good in some cases and highly biased (favor certain outcomes) in others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

the good, the bad and the ugly of sampling

A

Pros of sampling: save money, save time, increase practicality (population too large, population changing, collecting data destroys product), reduce monotony, increase accuracy

Cons of sampling (errors): confirmation bias, skewing of data to get a specific result, doesn’t represent the whole population, drawing good data, sampling biases, misuse of data, security.privacy issues, reliability of data, training.understanding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

the ripple effect of sampling

A

Samples affect…
The test type
The test form
Power
Confidence
Critical values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

qualitative data

A

Qualitative data are the result of categorizing or describing attributes of a population. Qualitative data are also often called categorical data.

Hair color, blood type, ethnic group, the car a person drives, and the street a person lives on are examples of qualitative(categorical) data.

Qualitative (categorical) data are generally described by words or letters.
For instance, hair color might be black, dark brown, light brown, blonde, gray, or red. Blood type might be AB+, O-, or B+. Researchers often prefer to use quantitative data over qualitative(categorical) data because it lends itself more easily to mathematical analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

quantiative data

A

Quantitative data are always numbers. Quantitative data are the result of counting or measuring attributes of a population.
Amount of money, pulse rate, weight, number of people living in your town, and number of students who take statistics are examples of quantitative data.

Quantitative data may be either discrete or continuous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

attributes of data: qualitative data

responses:
result:
format:
tallied:
processed:

A

Responses: descriptive or categorical place holders
Result: frequency counts of proportions
Format: discrete (discrete: value falls on exact point)
Tallied: counted
Processed: descriptive analysis and/or coarse statistical analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

discrete or continuous

A

Discrete data is a numerical type of data that includes whole, concrete numbers with specific and fixed data values determined by counting.

Continuous data includes complex numbers and varying data values measured over a particular time interval.

discrete is counting
continuous is measuring

Continuous data is a type of numerical data that refers to the unspecified number of possible measurements between two realistic points.

17
Q

attributes of data: quantitative data

responses:
result:
format:
tallied:
processed:

A

Responses: Numerical scale
Results: Numerical datasets
Format: discrete or continuous (redefine) (discrete: value falls on exact point) (continuous: value can fall anywhere on continuum)
Tallied: counted or measured
Processed: descriptive analysis and/or dynamic statistical analysis

18
Q

exceptions to the rule: qualitative vs quantitative

A

Qualitative
Descriptive or categorical placeholders
SSN
Jersey Numbers
RED ID
Serial Number

Quantitative
Numerical scale

Discrete - counted (values fall on exact points)
Person’s age
Pie chart

Continuous - measured (values fall anywhere on a continuum)
Age of customer product age
Line chart

19
Q

levels of data: nominal

A

Order does not matter
Can’t rank
Qualitative/categorical data

EX: student major, marital status, car model, product color
EX: blue, red, green, purple, brown
Doesn’t matter how it’s listed cuz it doesn’t change numbers

Categories, colors, names, labels and favorite foods along with yes or no responses are examples of nominal level data. Nominal scale data are not ordered.
Smartphone companies are another example of nominal scale data. The data are the names of the companies that make smartphones, but there is no agreed upon order of these brands, even though people may have personal preferences. Nominal scale data cannot be used in calculations.

20
Q

levels of data: ordinal

A

Order matters (ranked)
Ranking scale =/= referenced attribute

EX: olympic podium, hurricane scale, letter grades, movie ratings
EX: movie ratings - G, PG, PG13, R, NC-17
Based on important factors that differ on scales of each separate factor

Data that is measured using an ordinal scale is similar to nominal scale data but there is a big difference. The ordinal scale data can be ordered. An example of ordinal scale data is a list of the top five national parks in the United States. The top five national parks in the United States can be ranked from one to five but we cannot measure differences between the data.
Another example of using the ordinal scale is a cruise survey where the responses to questions about the cruise are “excellent,” “good,” “satisfactory,” and “unsatisfactory.” These responses are ordered from the most desired response to the least desired. But the differences between two pieces of data cannot be measured.
Like the nominal scale data, ordinal scale data cannot be used in calculations.

21
Q

levels of data: interval

A

Equal scaled distances
Can’t make ratio arguments
Arbitrary zero - at 0 whatever we are counting is still there (it’s not gone)

EX: voltage, calendar time, IQ score, temperature (C/F)
EX: 0 degrees F means it’s cold not that its not existent

Data that is measured using the interval scale is similar to ordinal level data because it has a definite ordering but there is a difference between data. The differences between interval scale data can be measured though the data does not have a starting point.
Temperature scales like Celsius (C) and Fahrenheit (F) are measured by using the interval scale. In both temperature measurements, 40° is equal to 100° minus 60°. Differences make sense. But 0 degrees does not because, in both scales, 0 is not the absolute lowest temperature. Temperatures like -10° F and -15° C exist and are colder than 0.

22
Q

levels of data: ratio

A

Equal scaled distances
Can make ratio arguments (200 is 2x of 100)
Absolute zero - (at 0 there is nothing there) - lots of business level data
EX: speed, customer acquisition cost, market share, income

Data that is measured using the ratio scale takes care of the ratio problem and gives you the most information. Ratio scale data is like interval scale data, but it has a 0 point and ratios can be calculated. For example, four multiple choice statistics final exam scores are 80, 68, 20 and 92 (out of a possible 100 points). The exams are machine-graded.
The data can be put in order from lowest to highest: 20, 68, 80, 92.
The differences between the data have meaning. The score 92 is more than the score 68 by 24 points. Ratios can be calculated. The smallest score is 0. So 80 is four times 20. The score of 80 is four times better than the score of 20.

23
Q

likert scale data (what is it?)

A

1-10 scale
On a scale of 1-10, how sad are you now?
Rate your experience

Ranking order so it is Ordinal or Interval

24
Q

mean, median and mode

A

To find the mean, add up the values in the data set and then divide by the number of values that you added.

To find the median, list the values of the data set in numerical order and identify which value appears in the middle of the list.

To find the mode, identify which value in the data set occurs most often.

25
Q

probability

A

Probability is a mathematical tool used to study randomness. It deals with the chance (the likelihood) of an event occurring. For example, if you toss a fair coin four times, the outcomes may not be two heads and two tails.

26
Q

population and parameter

A

In statistics, we generally want to study a population. You can think of a population as a collection of persons, things, or objects under study. To study the population, we select a sample.

A parameter is a numerical characteristic of the whole population that can be estimated by a statistic. Since we considered all math classes to be the population, then the average number of points earned per student over all the math classes is an example of a parameter.

27
Q

statistic and sample

A

From the sample data, we can calculate a statistic. A statistic is a number that represents a property of the sample. For example, if we consider one math class to be a sample of the population of all math classes, then the average number of points earned by students in that one math class at the end of the term is an example of a statistic.

The statistic is an estimate of a population parameter, in this case the mean.

28
Q

qualitative data

A

Qualitative data are the result of categorizing or describing attributes of a population. Qualitative data are also often called categorical data.

Hair color, blood type, ethnic group, the car a person drives, and the street a person lives on are examples of qualitative(categorical) data.
Qualitative (categorical) data are generally described by words or letters.

For instance, hair color might be black, dark brown, light brown, blonde, gray, or red. Blood type might be AB+, O-, or B+. Researchers often prefer to use quantitative data over qualitative(categorical) data because it lends itself more easily to mathematical analysis.

29
Q

quantitative data

A

Quantitative data are always numbers. Quantitative data are the result of counting or measuring attributes of a population.
Amount of money, pulse rate, weight, number of people living in your town, and number of students who take statistics are examples of quantitative data.
Quantitative data may be either discrete or continuous.

30
Q

All data that are the result of counting are called __.

Data that are not only made up of counting numbers, but that may include fractions, decimals, or irrational numbers, are called ___

A

All data that are the result of counting are called quantitative discrete data.

Data that are not only made up of counting numbers, but that may include fractions, decimals, or irrational numbers, are called quantitative continuous data

31
Q

how to display qualitative/categorical data

A

pie charts (100%)
bar graphs
pareto charts

ALL CAN ADD UP TO 100%

In a pie chart, categories of data are represented by wedges in a circle and are proportional in size to the percent of individuals in each category.
In a bar graph, the length of the bar for each category is proportional to the number or percent of individuals in each category. Bars may be vertical or horizontal.
A Pareto chart consists of bars that are sorted into order by category size (largest to smallest).

32
Q

sampling errors vs nonsampling errors

A

When you analyze data, it is important to be aware of sampling errors and nonsampling errors. The actual process of sampling causes sampling errors. For example, the sample may not be large enough. Factors not related to the sampling process cause nonsampling errors. A defective counting device can cause a nonsampling error.

33
Q

common problems with samples

A

In statistics, a sampling bias is created when a sample is collected from a population and some members of the population are not as likely to be chosen as others

  1. sample must be representative of the population. A sample that is not representative of the population is biased.
  2. SELF-SELECTED SAMPLES - Responses only by people who choose to respond, such as call-in surveys, are often unreliable.
  3. SAMPLE SIZE ISSUES - Samples that are too small may be unreliable. Larger samples are better, if possible. In some situations, having small samples is unavoidable and can still be used to draw conclusions
  4. UNDUE INFLUENCE - collecting data or asking questions in a way that influences the response
  5. NON-RESPONSE -The collected responses may no longer be representative of the population
  6. CAUSALITY - A relationship between two variables does not mean that one causes the other to occur.
  7. SELF-FUNDED - A study performed by a person or organization in order to support their claim. Is the study impartial? Read the study carefully to evaluate the work.
  8. MISLEADING USE OF DATA - improperly displayed graphs, incomplete data, or lack of context
  9. CONFOUNDING - When the effects of multiple factors on a response cannot be separated.
34
Q

frequency, relative frequency, cumulative relative frequency

(check notes for more info and how to)

A

A frequency is the number of times a value of the data occurs.

A relative frequency is the ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes. To find the relative frequencies, divide each frequency by the total number of students in the sample

Cumulative relative frequency is the accumulation of the previous relative frequencies. To find the cumulative relative frequencies, add all the previous relative frequencies to the relative frequency for the current row