Midterm 1 Flashcards
Data Science Lifecycle Step 1
Frame the problem
Data Science Lifecycle Step 2
Collect raw data
Data Science Lifecycle Step 3
Process the data
Data Science Lifecycle Step 4
Explore the data
Data Science Lifecycle Step 5
Perform in depth analysis
Data Science Lifecycle Step 6
Communicate results
Association
any relation or link
Causality
One thing causes the other
Categorical
Each value is from a fixed inventory
Numerical
Each value is a number (not a code)
Values
can be numerical or categorical, and of many subtypes within these
Distribution
For each different value of the variable, the frequency of individuals that have that value
Randomize
If you assign individuals to treatment and control at random, then the two groups are likely to be similar apart from the treatment
Random =/ = Haphazard … regardless of what the dictionary says (in probability theory)
Assignment statements
statements don’t have a value ; they perform an action
f(27)
(f- what function to call) (27-argument to the function) “Call f on 27”
t.select(label)
constructs a new table with just the specified columns
t.drop(label)
constructs a new table in which the specified columns are omitted
t.sort(label)
constructs a new table with rows sorted by the specified column
t.where(label, condition)
constructs a new table with just the rows that match the condition
Integers
an integer of any size,
an int never has a decimal point