Module 5 - Analyze the Data Using Statistics Flashcards
Differentiate samples vs population
- A population is the whole set of values or individuals you are interested in, while a sample is a subset of the population.
- Measurements on the entire population are often too complex or impossible, so representative samples are used to draw conclusions about the population.
- The numbers we’ve obtained when using a population are called parameters, while the numbers we’ve obtained when working with a sample are called statistics.
- Population parameters are more precise and accurate, as they are calculated using all available data.
- Researchers use samples to learn about populations.
are used to describe or summarize the values and observations of a data set.
Descriptive statistics
Describe why is this a Descriptive Statistics: For example, a fitness tracker logged a person’s daily steps and heart rate for a 10-day period. If the person met their fitness goals in 6 out of the 10 days, then they were successful 60% of the time. Over that 10-day period, you could observe that the person’s heart rate was a maximum of 140 beats per minute (bpm), but an average of 72 bpm. These observations would be descriptive statistics that could be used to describe and simplify the data set.
(Explain)
might include the total number of data points in a data set, the range of values that exist for those numeric data points, or the number of times a given value appears in a data set.
Basic descriptive statistics
may also answer questions about the occurrence of trends.
Descriptive statistics
The answers to these questions can be provided in numerical or graphical formats.
Descriptive statistics
Results of descriptive statistics are often represented in?
pie charts, bar charts or histograms.
One important point to note is that while descriptive statistics describe the current or historical state of the observed population, it does not allow for?
- comparison of groups
- conclusions to be drawn
- predictions to be made about data sets that are not in the population
is the process of collecting, analyzing and interpreting the data gathered from a sample to generalize or predict something about a population.
Inferential statistics
are a helpful way to make quick judgements about the quality of a sample.
Graphs of descriptive statistics
A number of inferential analyses are very commonly used in big data analytics which are?
Cluster analysis, Association analysis, Regression analysis
Used to find groups of observations that are similar to each other
Cluster analysis
Used to find co-occurrences of values for different variables
Association analysis
Used to quantify the relationship, if any, between the variations of one or more variables
Regression analysis
What are the questions to consider when picking a data visualization?
- How many variables are you going to show?
- How many data points are in each variable?
- Is your data over time or are you comparing data points at a single point in time?
Use this visualization when you have a continuous set of data, the number of data points is high, and/or you would like to show a trend in the data over time.
Line Charts