Chapter 1 Sampling and Data Flashcards

1
Q

Statistics

A

It is the study of gathering, describing, & analyzing data or actual numeric descriptions of sample data.

Ex. How we survey amount of time of students study statistics every week

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Probability

A

Is the chance that something will happen or how likely it is that some event will occur.

Ex. Chance of getting an A on a test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Experiment

A

Something that can be repeated that has a set of possible results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Population-

A

Population-The whole group that is being studied. A particular group of interest.

Ex. All males in the world, all females in the world, all children between 6-9 age

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Parameter

A

Parameter – is a measure that describes the entire population. It’s a numerical description of a population’s characteristic.

Ex. The mean height of all men in the world, mean IQ of all females in U.S.A, 75% age 6-9 play games

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sample

A

Sample - A selection taken from a larger group (the “population”) that will, hopefully, let you find out things about the larger group. Basically a subset of the population from which data is collected.

Ex. We asked 100 males what their favorite movie,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Statistic

A

Statistic - is a measure that describes a sample of a population. Basically numeric descriptions of particular sample characteristic.

Ex. 100 females, asked 47% dislike candy;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Variable

A

Variable – (words) usually notated by capital letters such as X and Y, is a characteristic that’s being counted, measured, or categorized. Variables may be numerical or categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Numerical variables

A

Numerical variables- take on values with equal units such as weight in pounds and time in hours

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Categorical variables

A

Categorical variables - place the person or thing into a category.

Ex: Political affiliation, eye color, gender, ethnicity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Discrete random variables

A

Discrete random variables -can take only a limited set of values. Many discrete random variables take on only non-negative integer values.

Ex X = Sum of dots when two dice are rolled. X is discrete and takes on

integer values from 2 to 12 only.

Ex. X = Number of foul shots out of ten sank by a person from the crowd

in halftime contest, Foul Shots. Contestant gets a free turkey for each

shot sank. X is a discrete variable with integer values from 0 to 10.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Categorical variables

A

Categorical variables- assign each population member to a designated category. The count of the number falling into a category is a discrete random variable that take on only non-negative integer values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Continuous random variables

A

Continuous random variables- can take on any numerical value within their range.

Ex. X = Weight in grams of widgets coming off the line at Factory A. X is a

continuous random variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data

A

Data – Information gathered or are the actual values of the variable. They may be numbers or they may be words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Datum

A

Datum - is a single value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the 4 different Levels of Measurement data(variables) can be classified in?

A

What are the 4 different Levels of Measurement data(variables) can be classified in? - Nominal scale level, Ordinal scale level, Interval scale level, and Ratio scale level.

17
Q

Nominal scale level

A

Nominal scale level - Data that is measured using a nominal scale is qualitative. Categories, colors, names, labels and favorite foods along with yes or no responses are examples of nominal level data. Nominal scale data are not ordered. A Nominal Number is a number used only as a name, or to identify something (not as an actual value or position). In other words, A nominal scale describes a variable with categories that do not have a natural order or ranking.

Ex. genotype, blood type, zip code, gender, race, eye color, political party, jersey number

18
Q

Ordinal scale level

A

Ordinal scale level - Data that is measured using an ordinal scale is similar to nominal scale data but there is a big difference. The ordinal scale data can be ordered. An Ordinal Number tells us the position of something in a list. In other words, an ordinal scale is one where the order matters but not the difference between values.

Ex. education level (“high school”,”BS”,”MS”,”PhD”), satisfaction rating (“extremely dislike”, “dislike”, “neutral”, “like”, “extremely like”), socio economic status (“low income”,”middle income”,”high income”), income level (“less than 50K”, “50K-100K”, “over 100K”)

19
Q

Interval scale level

A

Interval scale level - Data that is measured using the interval scale is similar to ordinal level data because it has a definite ordering but there is a difference between data. The differences between interval scale data can be measured though the data does not have a starting point. In other words, an interval scale is one where there is order and the difference between two values is meaningful.

Ex. temperature (Fahrenheit), temperature (Celsius), pH, SAT score (200-800), credit score (300-850)

20
Q

Ratio scale level

A

Ratio scale level- Data that is measured using the ratio scale takes care of the ratio problem and gives you the most information. Ratio scale data is like interval scale data, but it has a 0 point and ratios can be calculated. In other words, Ratio scales are like interval scales except they have true zero points.

Ex. Kelvin Scale, dose amount, reaction rate, flow rate, concentration, weight, length

21
Q

Qualitative data (Categorical data)

A

Qualitative data (Categorical data) - represent characteristics such as a person’s gender, marital status, hometown, or the types of movies they like. Categorical data can take on numerical values. Categorical data can take on numerical values (such as “1” indicating male and “2” indicating female), but those numbers don’t have meaning. You couldn’t add them together, for example. Ordinal data mixes numerical and categorical data. The data fall into categories, but the numbers placed on the categories have meaning. For example, rating a restaurant on a scale from 0 to 4 stars gives ordinal data. Ordinal data are often treated as categorical, where the groups are ordered when graphs and charts are made. I don’t address them separately in this book.

Ex. Your friends’ favorite holiday destination; The most common given names in your town; How people describe the smell of a new perfume;

Favorite baseball team; Town I live in; He has lots of energy

22
Q

Quantitative data

A

Quantitative data (Numerical data) - Can be Discrete or Continuous; These data have meaning as a measurement, such as a person’s height, weight, IQ, or blood pressure; or they’re a count, such as the number of stock shares a person owns, how many teeth a dog has, or how many pages you can read of your favorite book before you fall asleep.

23
Q

Quntitative Discrete Data

A

Quntitative Discrete data– can only take certain values (like whole numbers) “How many” represent items that can be counted; they take on possible values that can be listed out. The list of possible values may be fixed (also called finite); or it may go from 0, 1, 2, on to infinity (making it countably infinite). For example, the number of heads in 100 coin flips takes on values from 0 through 100 (finite case), but the number of flips needed to get 100 heads takes on values from 100 (the fastest scenario) on up to infinity. Its possible values are listed as 100, 101, 102, 103 . . . (representing the countably infinite case).

Ex. Number of classes; Population; He has 4 legs; He has 2 brothers

24
Q

Quantitative Continuous Data

A

Quantitative Continuous Data – can take any value (within a range). “How much” represent measurements; their possible values cannot be counted and can only be described using intervals on the real number line. For example, the exact amount of gas purchased at the pump for cars with 20-gallon tanks represents nearly continuous data from 0.00 gallons to 20.00 gallons, represented by the interval [0, 20], inclusive. (Okay, you can count all these values, but why would you want to? In cases like these, statisticians bend the definition of continuous a wee bit.) The lifetime of a C battery can be anywhere from 0 to infinity, technically, with all possible values in between. Granted, you don’t expect a battery to last more than a few hundred hours, but no one can put a cap on how long it can go (remember the Energizer Bunny?).

Ex. He weighs 25.5 kg(weight), He is 565 mm tall(height), He is 6 years old(age)

25
Q

What are the two types of statistics?

A

What are the two types of statistics? - Descriptive and inferential

26
Q

Descriptive Statistics

A

to gather, sort, summarize data using numbers & graphs. You are able to count every member in that population or can be also sample

Ex. 65% of seniors at a local high school applying to college plan to major in business.

Ex. The average height of a particular class

Ex. The mean weight of 11-year-olds who go to a certain middle school

27
Q

Inferential Statistics

A

Inferential Statistics - – uses sample data to make a conclusion of the population. We use inferential statistics when population is too large to ask/measure every member, on have access to some members of the population(wildlife research), there is risk involved(medical studies),or can’t find clean data about past(do jets at local airport hurt health?)

Ex. Base on a survey, 22% of all men dislike football

28
Q

Types of sampling (Sample should be representative of the population under study)

Sampling with vs Sampling without replacement

A

Sampling with replacement – Once a member is picked in a population, that member goes back and may be chosen more than once.

Sampling without replacement – a member of the population is chosen once and cannot be put back in the population

29
Q

Representative Sample

A

Representative Sample - a subset of the population that has the same characteristics as the population

Ex. a classroom of 30 students with 15 males and 15 females, could generate a representative sample that might include six students: three males and three females.

Sample should be representative of the population under study. How do you select a sample in a way that avoids bias? The key word is random. A random sample is a sample selected by equal opportunity; that is, every possible sample the same size as yours had an equal chance to be selected from the population. What random really means is that no group in the population is favored in or excluded from the selection process.

30
Q

Sampling Error

A

Sampling Error - is a statistical error that occurs when an analyst does not select a sample that represents the entire population of data and the results found in the sample do not represent the results that would be obtained from the entire population. the natural variation that results from selecting a sample to represent a larger population; this variation decreases as the sample size increases, so selecting larger samples reduces sampling error.

Ex. Let’s pretend that we are a group of researchers administering a survey with the goal of learning how much money a specific group of people spends while purchasing a vehicle. We interview 1000 people but 2 out of the 1000 people were millionaires. While interested in something directly related to a person’s income, such as how much individuals spend while purchasing a vehicle, by chance we put ourselves at risk of collecting data from significant outliers of the population.

31
Q

Sampling Bias

A

Sampling Bias - not all members of the population are equally likely to be selected or s a bias in which a sample is collected in such a way that some members of the intended population have a lower sampling probability than others.

32
Q

Simple random sampling

A

Simple random sampling – Sampling a straightforward method for selecting a random sample; give each member of the population a number. Use a random number generator to select a set of labels. These randomly selected labels identify the members of your sample. Basically, its is the where a subset or group of units (a sample) is selected from a larger group (a population). Each unit of the population has an equal chance of being selected in the sample. Each unit is chosen entirely by chance

33
Q

Stratified random sampling

A

Stratified random sampling – a method for selecting a random sample used to ensure that subgroups of the population are represented adequately; divide the population into groups (strata). Use simple random sampling to identify a proportionate number of individuals from each stratum…. A stratified sample is when you decide that there is some variable that impacts results, so much that you must account for it….Ex. you might be taking a survey of high school students and you think that seniors (12th graders) as a whole will have a much different opinion than freshmen (9th graders). In your methodology instead of asking 80 random high school students, ask 20 seniors, 20 juniors, 20 sophomores, and 20 freshmen, (if the school is evenly split between the 4 grades). Your sample is no longer a simple random sample (SRS) it is now a stratified sample.

34
Q

Cluster random sampling

A

Cluster random sampling – a method for selecting a random sample and dividing the population into groups (clusters); use simple random sampling to select a set of clusters. Every individual in the chosen clusters is included in the sample…. Cluster sampling is an ability used when it becomes hard to study the target population spread across a wide area and simple random sampling cannot be applied. Cluster Sample is a prospect sample where each sampling unit is a collection or cluster of elements.

For Ex, A researcher wants to survey the academic performance of high school students in Japan. He can divide the whole population of Japan into different clusters (cities). Then the researcher selects a number of clusters based on his research through simple or systematic random sampling.

35
Q

Systematic sampling

A

Systematic sampling - is a type of probability sampling method in which sample members from a larger population are selected according to a random starting point but with a fixed, periodic interval. This interval, called the sampling interval, is calculated by dividing the population size by the desired sample size.Systematic sampling – a method for selecting a random sample; list the members of the population. Use simple random sampling to select a starting point in the population. Let k = (number of individuals in the population)/(number of individuals needed in the sample). Choose every kth individual in the list starting with the one that was randomly selected. If necessary, return to the beginning of the population list to complete your sample……. In some instances the most practical way of sampling is to select every nth item in the list. Such type of sampling is known as systematic sampling.

ex Suppose a supermarket wants to study buying habits of their customers, then using systematic sampling they can choose every 10th or 15th customer entering the supermarket and conduct the study on this sample.

36
Q

Convenience nonrandom sampling(accidental sampling or grab sampling)

A

Convenience nonrandom sampling - a nonrandom method of selecting a sample; this method selects individuals that are easily accessible and may result in biased data… A convenience sample is a type of non-probability sampling method where the sample is taken from a group of people easy to contact or to reach.

*Convenience sampling- (also called accidental sampling or grab sampling) is a nonrandom method to get a sample is where you include people who are easy to reach.

  • It’s relatively easy to get a sample.
  • It’s inexpensive, compared to other methods.
  • Participants are readily available.

For example, standing at a mall or a grocery store and asking people to answer questions would be an example of a convenience sample. Often used in advisement

37
Q

Frequency

A

Frequency – the number of times a value of the data occurs or the number of individuals in each group. This could be how many times an event happened, or happened within a given time frame. It could be how often a continuous variable, such as a stock price, was in a certain range in a given amount of time. In other words, its s how often something occurs

Example: Sam played football on Saturday Morning, Saturday Afternoon, and Thursday Afternoon. The frequency was 2 on Saturday, 1 on Thursday, and 3 for the whole week.

Another example, if ten students score 80 in statistics, then the score of 80 has a frequency of 10. Frequency is often represented by the letter f.

38
Q

Relative Frequency

A

Relative Frequency – the ratio of the number of times a value of the data occurs in the set of all outcomes to the number of all outcomes to the total number of outcomes.

In other words, how often something happens divided by all outcomes.

Example, 92 people were asked how they got to work. 35 used a car, 42 took public transport, 8 rode a bicycle, and 7 walked. The Relative Frequencies (to 2 decimal places) are: Car: 35/92 = 0.38, Public Transport: 42/92 = 0.46, Bicycle: 8/92 = 0.09, and Walking: 7/92 = 0.08. Another example, your team has won 9 games from a total of 12 games played: The Frequency of winning is 9. the Relative Frequency of winning is 9/12 = 75%

39
Q

Cumulative Relative Frequency

A

Cumulative Relative Frequency - The term applies to an ordered set of observations from smallest to largest. The cumulative relative frequency is the sum of the relative frequencies for all values that are less than or equal to the given value. In other words, The total of a frequency and all frequencies so far in a frequency distribution.