Data Science Flashcards by Annika Cat

quantitative data

used to measure the amount of something (eg: mass)

How well did you know this?

Not at all

Perfectly

categorical data

used to classify instead of measure (eg: species of an animal)

How well did you know this?

Not at all

Perfectly

Definitions area

where you write the code in Pyret

How well did you know this?

Not at all

Perfectly

Interactions area

where the output is

How well did you know this?

Not at all

Perfectly

Pyret decimals

must start with 0.

How well did you know this?

Not at all

Perfectly

bar chart

count
Visual representation of value’s frequency
Column for every category

How well did you know this?

Not at all

Perfectly

Pie chart

Percentage
Visual representation of RELATIVE frequency
slice for every column
Max 7 slices, generally 5

How well did you know this?

Not at all

Perfectly

stacked bar chart

shows more detail about another column (eg: count of species and sex)

How well did you know this?

Not at all

Perfectly

data cycle

ask questions, consider data, analyze data, interpret data (mnemonic: QCAI )

How well did you know this?

Not at all

Perfectly

lookup questions

answered by looking up a single value in a table

How well did you know this?

Not at all

Perfectly

arithmetic questions

computing an answer within a single column Can be finding the average, max, min in a column

How well did you know this?

Not at all

Perfectly

statistical questions

asks a question about the relationship between two columns?

How well did you know this?

Not at all

Perfectly

null hypothesis

a type of statistical hypothesis that proposes no statistical significance exists in a set of given observations

How well did you know this?

Not at all

Perfectly

random samples

a subset of a population in which each member has an equal chance of being chosen. Larger the random sample, the more accurate

How well did you know this?

Not at all

Perfectly

grouped samples

a subset of the population in which each member of the subset was chosen for a specific reason

How well did you know this?

Not at all

Perfectly

file extension purpose

tell your computer which application created or can open open the file and which icon to use for the file

How well did you know this?

Not at all

Perfectly

what does CSV stand for?

comma-separated values

How well did you know this?

Not at all

Perfectly

histograms

shows the number of rows that fall within certain intervals (or “bins”) along the horizontal axis

How well did you know this?

Not at all

Perfectly

⬛️⬛️
⬛️⬛️⬛️⬛️

How well did you know this?

Not at all

Perfectly

type of histogram
📉
(imagine this as a histogram)
⬛️
⬛️⬛️
⬛️⬛️⬛️
⬛️⬛️⬛️⬛️⬛️⬛️⬛️

skew right

How well did you know this?

Not at all

Perfectly

type of histogram
📈
(imagine this as a histogram)
⬛️
⬛️⬛️
⬛️⬛️⬛️
⬛️⬛️⬛️⬛️⬛️⬛️

Skew left

How well did you know this?

Not at all

Perfectly

what does it mean to be an outlier

Compare it to the other data. But it is important to think about all extreme data points, not just outliers

How well did you know this?

Not at all

Perfectly

mean

average
symmetric medium-large dataset

How well did you know this?

Not at all

Perfectly

median

Study These Flashcards

Half the values are smaller and half are larger. The middle number or average of two middle #s
If data is asymmetric, use median

mode

or #s that occur the most often in a dataset in small dataset, mode will likely be most accurate measure of center

how many quartiles are in a box plot?

histogram and box plot shape

whisker direction is the same direction as the skew

standard deviation

the most useful way to summarize spread of quantitative columns

how to calculate standard deviation

average spread from mean

standard deviation equation

sqrt([number of squares of distances] / [# - 1] )

explanatory variable

a type of independent variable (x) scatterplot

response variable

a type of dependent variable (y) scatterplot

correlation statistic between -1 and +1 -1 = strongest negative correlation +1 = strongest positive correlation 0 = no correlation

what is the regression line also known as?

Line of best fit, least quares line, predictor, trendline

definition of a row

cat-row = row-n(animals-table, #)

look up identify

cat-row[species"] (have cat-row predefined)

how to make a function

fun gt(name): fun(parameters) end

what is an example for functions?

shows what the function does

example example

fun f(x): x / 2 end examples f(2) is 2 / 2 f(10) is 10 / 2 end

what functions need a helper function

image-scatter-plot, build-column

what function to make a specific table

sort or build-column filter(build-column(animals-table, "kilos", kilogram), is-heavy)

syntax errors

typos and easy to spot. code will not run

runtime error

the app runs for a bit and crashes at specific point in the code

logic error

the app runs completely but simply produces the wrong input

four categories of dirty data

missing data, inconsistent types, inconsistent units/invalid range, inconsistent naming

missing data

some cells have data. Some do not

inconsistent types

a column where the values have different data types. (eg: 2, two)

inconsistent unit/ invalid range

where the data types are the same but represent different units

inconsistent naming

inconsistent spelling and capitalization in entries

selection bias

if the participants selected are representative of the group study

bias in the study design

if the study was not designed specifically and ended up not measuring what was asked very specifically

poor choice of summary

using the wrong data analysis technique: mean/median

confounding variables

correlation does not imply causation. an outside influence other than the one being studies

intentionally using the wrong chart

misleads the audience, can remove holes in data, making it inaccurate

changing the scale of a chart

makes the data look a certain way

Data Science Flashcards

(59 cards)