Midterm 1 Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Data Science Lifecycle Step 1

A

Frame the problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Science Lifecycle Step 2

A

Collect raw data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Science Lifecycle Step 3

A

Process the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data Science Lifecycle Step 4

A

Explore the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Science Lifecycle Step 5

A

Perform in depth analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data Science Lifecycle Step 6

A

Communicate results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Association

A

any relation or link

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Causality

A

One thing causes the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Categorical

A

Each value is from a fixed inventory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Numerical

A

Each value is a number (not a code)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Values

A

can be numerical or categorical, and of many subtypes within these

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Distribution

A

For each different value of the variable, the frequency of individuals that have that value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Randomize

A

If you assign individuals to treatment and control at random, then the two groups are likely to be similar apart from the treatment
Random =/ = Haphazard … regardless of what the dictionary says (in probability theory)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Assignment statements

A

statements don’t have a value ; they perform an action

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

f(27)

A

(f- what function to call) (27-argument to the function) “Call f on 27”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

t.select(label)

A

constructs a new table with just the specified columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

t.drop(label)

A

constructs a new table in which the specified columns are omitted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

t.sort(label)

A

constructs a new table with rows sorted by the specified column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

t.where(label, condition)

A

constructs a new table with just the rows that match the condition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Integers

A

an integer of any size,
an int never has a decimal point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Floats

A

has an optional fractional part
always has a decimal point
may use scientific notation
they have limited size (but the limit is huge)
they have limited precision of 15-16 decimal places
after arithmetic the final few decimal places can be wrong

22
Q

string

A

a set of characters of any length - ‘A’

23
Q

Arrays

A

A collection of things
sequence of values of the same type (Arrays -> Columns)

24
Q

Ranges

A

A range is an array of consecutive numbers

25
Q

np.arange(end)

A

An array of increasing integers from 0 up to end

26
Q

np.arange(start,end)

A

An array of increasing

27
Q

np.arange(start,end,step)

A

A range with step between consecutive values

28
Q

NOTE: The range always includes start but excludes end

A
29
Q

Table.read_table(filename)

A

reads a table from a spreadsheet

30
Q

Numerical Attribute types

A

Each value is from a numerical scale
Numerical measurements are ordered
Differences are meaningful

31
Q

Categorical Attribute types

A

Each value is from a fixed inventory
May or may not have an ordering
Categories are the same or different

32
Q

Use line polts for sequential data if

A

Your x axis has an order
Sequential differences in y values are meaningful
Theres only one y-value for each x-value
Usually x-axis is time or distance

33
Q

Use scatter plots for non-sequential data

A

When you’re looking for associations

34
Q

Binning

A

counting the number of numerical values that lie within rages, called bins

35
Q

Bins

A

defined by their lower bounds (inclusive)
The upper bound is the lower bound of the next bin

36
Q

Histogram

A

Chart that displays the distribution of a numerical value / attribute
Uses bins; there is one bar corresponding to each bin
Uses the area principle
The area of each bar is the percent of individuals in the corresponding bin

37
Q

Height formula

A

(% in bin/width of bin)

38
Q

Area of bar formula

A

% in bin = Height x width of bin

39
Q

Scatter plot

A

relation between numerical variables

40
Q

Line graph

A

sequential data (over time)

41
Q

Bar chart

A

distribution of categorical data

42
Q

Histogram

A

distribution of numerical data

43
Q

Grouped Table

A

One combo of grouping variables per row
Any number of grouping variables
Aggregate values of all other columns in table
Missing combos absent

44
Q

Pivot table

A

One combo of grouping variables per entry
Two grouping variables: columns and rows
Aggregate values of values column
Missing combos = 0 (or empty string)

45
Q

Probability

A

Lowest value: 0
Chance of even that is impossible
Highest value: 1 (or 100%)
Chance of event that is certain
Complement: if an event has chance 70%, then the chance that it doesn’t happen is
100% - 70% = 30%
1 - 0.7 = 0.3

46
Q

Equally likely outcomes

A

P(A) = (number of outcomes that make A happen) / (total number of outcomes)

47
Q

Multiplication Rule

A

Chance that two events A and B both happen
P(A) = P(A happens) x P(B happens given that A has happened)

48
Q

Addition Rule

A

If event A can happen in exactly one of two ways, then
P(A) = P(first way) + P(second way)

49
Q

Pivot

A

Cross-classified according to two categorical variables
Produces a grid of counts or aggregated values
Two required arguments
First (A): variable that forms column labels of grid
Second (B): variable that forms row labels of grid
Two optional arguments (include both or neither)
values = column_label_to_aggregate
collect = function_to_aggregate_with

50
Q

Lists

A

sequence of values of different types (Lists -> Rows)

51
Q

Groups

A

collect rows by some column