Data Science Flashcards
quantitative data
used to measure the amount of something (eg: mass)
categorical data
used to classify instead of measure (eg: species of an animal)
Definitions area
where you write the code in Pyret
Interactions area
where the output is
Pyret decimals
must start with 0.
bar chart
count
Visual representation of value’s frequency
Column for every category
Pie chart
Percentage
Visual representation of RELATIVE frequency
slice for every column
Max 7 slices, generally 5
stacked bar chart
shows more detail about another column (eg: count of species and sex)
data cycle
ask questions, consider data, analyze data, interpret data (mnemonic: QCAI )
lookup questions
answered by looking up a single value in a table
arithmetic questions
computing an answer within a single column Can be finding the average, max, min in a column
statistical questions
asks a question about the relationship between two columns?
null hypothesis
a type of statistical hypothesis that proposes no statistical significance exists in a set of given observations
random samples
a subset of a population in which each member has an equal chance of being chosen. Larger the random sample, the more accurate
grouped samples
a subset of the population in which each member of the subset was chosen for a specific reason
file extension purpose
tell your computer which application created or can open open the file and which icon to use for the file
what does CSV stand for?
comma-separated values
histograms
shows the number of rows that fall within certain intervals (or “bins”) along the horizontal axis
⬛️⬛️
⬛️⬛️⬛️⬛️
?
type of histogram
📉
(imagine this as a histogram)
⬛️
⬛️⬛️
⬛️⬛️⬛️
⬛️⬛️⬛️⬛️⬛️⬛️⬛️
skew right
type of histogram
📈
(imagine this as a histogram)
⬛️
⬛️⬛️
⬛️⬛️⬛️
⬛️⬛️⬛️⬛️⬛️⬛️
Skew left
what does it mean to be an outlier
Compare it to the other data. But it is important to think about all extreme data points, not just outliers
mean
average
symmetric medium-large dataset