Data Science Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

quantitative data

A

used to measure the amount of something (eg: mass)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

categorical data

A

used to classify instead of measure (eg: species of an animal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Definitions area

A

where you write the code in Pyret

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Interactions area

A

where the output is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Pyret decimals

A

must start with 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

bar chart

A

count
Visual representation of value’s frequency
Column for every category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Pie chart

A

Percentage
Visual representation of RELATIVE frequency
slice for every column
Max 7 slices, generally 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

stacked bar chart

A

shows more detail about another column (eg: count of species and sex)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

data cycle

A

ask questions, consider data, analyze data, interpret data (mnemonic: QCAI )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

lookup questions

A

answered by looking up a single value in a table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

arithmetic questions

A

computing an answer within a single column Can be finding the average, max, min in a column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

statistical questions

A

asks a question about the relationship between two columns?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

null hypothesis

A

a type of statistical hypothesis that proposes no statistical significance exists in a set of given observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

random samples

A

a subset of a population in which each member has an equal chance of being chosen. Larger the random sample, the more accurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

grouped samples

A

a subset of the population in which each member of the subset was chosen for a specific reason

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

file extension purpose

A

tell your computer which application created or can open open the file and which icon to use for the file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what does CSV stand for?

A

comma-separated values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

histograms

A

shows the number of rows that fall within certain intervals (or “bins”) along the horizontal axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

⬛️⬛️
⬛️⬛️⬛️⬛️

A

?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

type of histogram
📉
(imagine this as a histogram)
⬛️
⬛️⬛️
⬛️⬛️⬛️
⬛️⬛️⬛️⬛️⬛️⬛️⬛️

A

skew right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

type of histogram
📈
(imagine this as a histogram)
⬛️
⬛️⬛️
⬛️⬛️⬛️
⬛️⬛️⬛️⬛️⬛️⬛️

A

Skew left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what does it mean to be an outlier

A

Compare it to the other data. But it is important to think about all extreme data points, not just outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

mean

A

average
symmetric medium-large dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

median

A

Half the values are smaller and half are larger. The middle number or average of two middle #s
If data is asymmetric, use median

24
Q

mode

A

or #s that occur the most often in a dataset
in small dataset, mode will likely be most accurate measure of center

25
Q

how many quartiles are in a box plot?

A

3

26
Q

histogram and box plot shape

A

whisker direction is the same direction as the skew

27
Q

standard deviation

A

the most useful way to summarize spread of quantitative columns

28
Q

how to calculate standard deviation

A

average spread from mean

29
Q

standard deviation equation

A

sqrt([number of squares of distances] / [# - 1] )

30
Q

explanatory variable

A

a type of independent variable (x)
scatterplot

31
Q

response variable

A

a type of dependent variable (y)
scatterplot

32
Q

r

A

correlation statistic
between -1 and +1
-1 = strongest negative correlation
+1 = strongest positive correlation
0 = no correlation

33
Q

what is the regression line also known as?

A

Line of best fit, least quares line, predictor, trendline

34
Q

definition of a row

A

cat-row = row-n(animals-table, #)

35
Q

look up identify

A

cat-row[species”]
(have cat-row predefined)

36
Q

how to make a function

A

fun gt(name): fun(parameters) end

37
Q

what is an example for functions?

A

shows what the function does

38
Q

example example

A

fun f(x): x / 2 end
examples
f(2) is 2 / 2
f(10) is 10 / 2
end

39
Q

what functions need a helper function

A

image-scatter-plot, build-column

40
Q

what function to make a specific table

A

sort or build-column
filter(build-column(animals-table, “kilos”, kilogram), is-heavy)

41
Q

syntax errors

A

typos and easy to spot. code will not run

42
Q

runtime error

A

the app runs for a bit and crashes at specific point in the code

43
Q

logic error

A

the app runs completely but simply produces the wrong input

44
Q

four categories of dirty data

A

missing data, inconsistent types, inconsistent units/invalid range, inconsistent naming

45
Q

missing data

A

some cells have data. Some do not

46
Q

inconsistent types

A

a column where the values have different data types. (eg: 2, two)

47
Q

inconsistent unit/ invalid range

A

where the data types are the same but represent different units

48
Q

inconsistent naming

A

inconsistent spelling and capitalization in entries

49
Q

selection bias

A

if the participants selected are representative of the group study

50
Q

bias in the study design

A

if the study was not designed specifically and ended up not measuring what was asked very specifically

51
Q

poor choice of summary

A

using the wrong data analysis technique: mean/median

52
Q

confounding variables

A

correlation does not imply causation. an outside influence other than the one being studies

53
Q

intentionally using the wrong chart

A

misleads the audience, can remove holes in data, making it inaccurate

54
Q

changing the scale of a chart

A

makes the data look a certain way

55
Q
A
56
Q
A
57
Q
A
58
Q
A