Chapter 1 & 2 (Intro to Data and Visualizations) Flashcards
Descriptive statistics
numbers used to summarize and describe data; do not involve generalizing beyond the data at hand
Sample
a finite set of observations or a small subset of data drawn from a population or a larger subset of data to make inferences about the latter
Population
larger set of data from which a sample is drawn; cannot be observed because it is theoretical
Inferential statistics
mathematical procedures whereby we convert information about a sample into intelligent guesses about the populations, assuming sampling is random
Simple random sampling
Every member of the population has an equal chance of being selected as part of the sample; the selection of members are independent of one another
What is the importance of sample size?
Random samples, especially with a small sample size, is not always representative of the population
Random assignment
random division of the sample into two groups
What is the difference between failing to randomize assignment and having a non-random sample?
Failing to randomize invalidates the experimental findings while a non-random sample just restricts the generalizability of the results
Stratified random sampling
(1) Identifying the members of your sample that belong to each strata or group in the population (2) Randomly sample from each subgroup so that the sizes of the subgroups in the sample are relative to those in the population
Variables
properties or characteristics of some event, object, or person (observations) that can take on different values or amounts
What is the number of levels of an independent variable?
the number of experimental conditions
Qualitative vs. Quantitative variables
Qualitative/nominal/categorical variables have no numerical ordering but are often coded or represented with numbers, while quantitative variables are measured in numbers with some kind of unit
Discrete vs. Continuous variables
Discrete are whole numbers on the scale while continuous can contain decimals and are not made of discrete steps
What is the value of the area under the normal distribution bell curve?
1
What is the probability of any exact value of x in a normal distribution?
0; the more precise the value of x, the closer the probability is to 0
What does the area under the normal distribution curve and bounded between two given points on the x-axis represent?
`the probability that a randomly chosen number will fall between the two points
Positively skewed distribution
“skewed to the right,” the longer tail extends in the positive direction
Bimodal distribution
a distribution with two peaks
What are the different kinds of kurtosis in a distribution?
Leptokurtic (long tails; has more scores on its tails) and platykurtic (short tails)
Probability distribution
specifies the probability of different events (or combinations) in a population
Event
a specific combination of attributes observed in a particular observation; a generalization of a specific attribute to a combination of attributes
Distrubution
specifies the likelihood of different events in the population using a probability
Objective probability
the probability of an event is the relative frequency of that event occurring if the situation is observed frequently (Frequentist); can be measured and repeated
Subjective probability
the probability of an event is the belief about the likelihood that the given event will occur (Bayesian)
Probability density function (PDF)
describes the probability of a specific value occurring; “density” because the variable may not have well-defined values if it’s continuous
What do you call a PDF for a discrete variable?
probability mass function (PMF)
What notation (f) is used to express the value of the PDF for a variable x at value v?
fx(v)= P(X=v), the probability of x taking on the value v
Frequency tables
shows the frequencies of various response categories and their relative frequencies (proportion of responses in each category)
Pie chart
each category is represented by a slice in the pie where the area of the slice is proportional to the percentage of responses in the category (relative frequency x 100)
When is a pie chart effective?
when displaying the relative frequencies of a small number of categories
Datum (plural data)
information which is relevant for a decision or for drawing a conclusion; can be seen as a single attribute of a single observation aggregated together to produce a dataset
Dataset
a collection of data, usually of different types
When does information become data?
when it used to answer a question
How is good and bad data determined?
how suitable it is to a question