Data Science Notes Flashcards
What is a python list ?
A python list is a sequence of values. It can consist of any types of data.
Lists are mutable meaning that you can change the order of item and reassign a new item.
What is NumPy ?
Numpy provides the ndarray object for efficient storage and manipulation of dense data arrays in python.
What is Pandas ?
- This library provides the DataFrame object for efficient storage and manipulation of labeled/columnar data in python.
- Pandas is high level tool for doing data manipulation/transformation
What is a distribution?
distribution is the set of all possible random variables together. a random variable is the result of each flip of a coin.
What is a binomial distribution?
the distribution is called binomial since there are two possible outputs a heads or a tails.
What is discreet distribution?
There are only categories being used a heads and a tails and not real numbers.
What is broadcasting?
Broadcasting is simply a set of rules for applying binary ufuncs (addition, subtraction etc.) on arrays of different sizes.
for example: [1, 2, 3] + 5 = [6, 7, 8]
What are the rules of broadcasting?
- If array shapes differ, left-pad the smaller shape with 1s
- If any dimensions does not match, broadcast the dimensions with size = 1
- If neither non-matching dimension is 1, raise an error.
What is fancy indexing?
it means passing an array of indices to access multiple array elements at once.
What is central tendency?
Central tendency refers to the central position of the data (mean, median, mode) while the deviation describes how far spread out the data are from the mean.
What is deviation?
Deviation is most commonly measured with the standard deviation. A small standard deviation indicates the data are close to the mean, while a large standard deviation indicates that the data are more spread out from the mean.
What is Descriptive statistics?
Descriptive statistics identify patterns in the data, but they don’t allow for making hypotheses about the data.
What is Inferential statistics?
Inferential statistics allow us to make hypotheses (or inferences) about a sample that can be applied to the population.
What is correlation matrix with pandas’ corr method. ?
The values in the correlation matrix table will be between -1 and 1. A value of -1 indicates the strongest possible negative correlation, meaning as one variable decreases the other increases. And a value of 1 indicates the opposite.
Descriptive Statistics
Descriptive statistics are a collection of statistical tools which are used to quantitatively describe or summarize a collection of data. Descriptive statistics aim to summarize, and as such can be distinguished from inferential statistics, which are more predictive in nature.