Lesson 2.1: Data, Variables & Samples Flashcards
Central Tendencies and Variability
population
Entire group of individuals or objects to be studied
sample
Subset of population that is being studied
individual
Person or object that is part of the population being studied
statistics
Numerical summary of a sample
parameter
Numerical summary of a population
inferential statistics
uses methods thattake a result from a sample, extend it to the
population, and measure reliability of the result
variable
Variables are characteristics of an individual within a population
Qualitative / Categorical Variables
- Allow for classification of individuals based on some attribute
- Cannot perform arithmetic operation on this data
Quantitative / Numerical Variables
Provide numerical measure of individuals
Qualitative Variables
Types (3)
- Dichotomous
- Nominal
- Ordinal
Qualitative Variables
Dichotomous
- Only 2 values
- eg. present/absent, alive/dead
Qualitative Variables
Nominal
- Unordered
- eg. A, B, AB, O Blood Type
Qualitative Variables
Ordinal
- ordered
- eg. Rate Pain scale (mild to severe)
Quantitative Variables
Discrete
- certain values gaps
- eg. sick days per year
Quantitative Variables
Continuous
- no value gaps
- eg. blood glucose levels
Quantitative / Numerical
Interval Scale
- Numerical data measured using an ordered scale
- Difference between measurement is meaningful
- But does NOT have a true zero
Example: How satisfied are you
from airline service
- Scale: 10, 8, 6 ,4
- ACT or SAT scores
Quantitative / Numerical
Ratio Scale
- Numerical data measured using
an ordered scale - Difference between
measurement is meaningful - And involves a true zero point
Most of the numerical data is ratio type because it has a true
zero
- Example: Height, Weight
Categorical Variables in R
Factors
- nominal or ordinal variables
- factor(…) converts string variables to factors
- factor(status, order=TRUE) for ordinal variables
- factors (categories) listed alphabetically by default
Categorical Variables in R
Factors: specific order
R code
factor(variable, order=TRUE, levels = c(…)
- eg. status <- factor(status, order=TRUE, levels=c(“Poor”, “Improved”, “Excellent”)
- assigns numbers to factors (eg. 1,2,3 for E, I, P )
class(…)
- displays the data type
- eg. ordered, factor, string, numeric, etc.
Bias
Types (3)
Sampling Bias
- Technique used to obtain the individuals to be in the sample tends to favor one part of the population
Non response bias
- Individual selected to be in the sample who do not respond
Response Bias
- Answer on the survey do not represent true feeling of the respondent
Sampling in R
set.seed()
- creates sampling population
- need to set new population (…) every time
sampling
sample(x,y)
sample(x, y, replace=T)
- generate a set size (y) of randomly sampled values (x)
- sample (1:10, 5) = 5 sampled values from range 1:10
- sample(1:10, 5, replace = T) = can replace selected value for reselection
nrow(DataFile)
returns number of rows in Data file
floor(DataFile)
returns lowest integer value from calculation
rewatch video