Module 5 - Analyze the Data Using Statistics Flashcards

1
Q

Differentiate samples vs population

A
  • A population is the whole set of values or individuals you are interested in, while a sample is a subset of the population.
  • Measurements on the entire population are often too complex or impossible, so representative samples are used to draw conclusions about the population.
  • The numbers we’ve obtained when using a population are called parameters, while the numbers we’ve obtained when working with a sample are called statistics.
  • Population parameters are more precise and accurate, as they are calculated using all available data.
  • Researchers use samples to learn about populations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

are used to describe or summarize the values and observations of a data set.

A

Descriptive statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe why is this a Descriptive Statistics: For example, a fitness tracker logged a person’s daily steps and heart rate for a 10-day period. If the person met their fitness goals in 6 out of the 10 days, then they were successful 60% of the time. Over that 10-day period, you could observe that the person’s heart rate was a maximum of 140 beats per minute (bpm), but an average of 72 bpm. These observations would be descriptive statistics that could be used to describe and simplify the data set.

A

(Explain)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

might include the total number of data points in a data set, the range of values that exist for those numeric data points, or the number of times a given value appears in a data set.

A

Basic descriptive statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

may also answer questions about the occurrence of trends.

A

Descriptive statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The answers to these questions can be provided in numerical or graphical formats.

A

Descriptive statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Results of descriptive statistics are often represented in?

A

pie charts, bar charts or histograms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

One important point to note is that while descriptive statistics describe the current or historical state of the observed population, it does not allow for?

A
  • comparison of groups
  • conclusions to be drawn
  • predictions to be made about data sets that are not in the population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

is the process of collecting, analyzing and interpreting the data gathered from a sample to generalize or predict something about a population.

A

Inferential statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

are a helpful way to make quick judgements about the quality of a sample.

A

Graphs of descriptive statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A number of inferential analyses are very commonly used in big data analytics which are?

A

Cluster analysis, Association analysis, Regression analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Used to find groups of observations that are similar to each other

A

Cluster analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Used to find co-occurrences of values for different variables

A

Association analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Used to quantify the relationship, if any, between the variations of one or more variables

A

Regression analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the questions to consider when picking a data visualization?

A
  • How many variables are you going to show?
  • How many data points are in each variable?
  • Is your data over time or are you comparing data points at a single point in time?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Use this visualization when you have a continuous set of data, the number of data points is high, and/or you would like to show a trend in the data over time.

A

Line Charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Give some examples of the uses of Line Charts

A
  • Quarterly sales for the past five years
  • Number of customers per week in the first year of a new retail shop
  • Change in a stock’s price on one day, from opening to closing bell
18
Q

What are the best practices for Line Charts?

A
  • Label the axes.
  • Plot time on the x-axis (horizontal) and the data values on the y-axis (vertical). Use a solid line (rather than a broken line) to emphasize continuity of the data.
  • Keep the number of data sets to a minimum. There should be a very good reason for plotting more than four lines. If needed, add a legend to help the audience understand what they are viewing.
  • Remove or minimize gridlines to reduce distraction. Consider using no gridlines except to emphasize certain values or time periods.
  • Modify the y-axis starting point to obtain something close to a 45-degree slope in one or more of the lines. This ensures you emphasize the change in the data without introducing distortions that dramatize the visualization.
19
Q

This type of chart is probably the most common chart type used to display the values of a specific variable across similar categories.

A

Column Chart

20
Q

Give some examples of Column Chart

A
  • Populations of the BRICS nations (Brazil, Russia, India, China, and South Africa)
  • Last year’s sales for the top four car companies
  • Average student test scores for six math classes
21
Q

What are the best practices of column chart ?

A
  • Label the axes.
  • If changes over time are being shown, time should be plotted on the x-axis.
  • If time is not part of the data, consider ordering the data so that column heights ascend or descend.
  • Fill the columns with a solid color. To highlight one column, consider using an accent color and make all the other columns the same color.
  • Column charts are best when there are no more than seven categories on the horizontal axis. This will help the viewer clearly see the value for each column.
  • Start the value of the y-axis at zero to accurately reflect the full value of each column.
  • The spacing between columns should ideally be roughly half the width of a column.
22
Q

are similar to column charts except they are positioned horizontally and hence used slightly differently

A

Bar Charts

23
Q

What are some examples of Bar Charts?

A
  • Gross domestic product (GDP) of the 25 highest-producing nations in a given year
  • Number of cars sold by each sales representative in a group
  • Exam scores for each student in a math class
24
Q

What are some best practices of Bar Charts?

A
  • Label the axes.
  • Consider ordering the bars so that the lengths go from longest to shortest. The meaning of the data shown will most likely determine whether the longest bar should be on the bottom or the top for greatest impact or easiest understanding.
  • Fill the bars with a solid color. To highlight one bar, consider using an accent color and make all the other bars the same color.
  • Start the value of the x-axis at zero to accurately reflect the full value of each bar.
  • The spacing between bars should ideally be roughly half the width of a bar.
25
Q

are used to show the composition of a total. Segments of different sizes visually represent percentages of that total. The sum of the segments must equal 100%.

A

Pie charts

26
Q

What are some examples of Pie Charts?

A
  • Annual expenses for a corporation (e.g., rent, administrative, utilities, production)
  • A country’s energy sources (e.g., oil, coal, gas, solar, wind)
  • Survey results for a group’s favorite type of movie (e.g., action, romance, comedy, drama, science fiction)
27
Q

What are some example of Pie Charts?

A
  • Limit the number of categories so that the viewer can easily differentiate between segments and their meaning in relation to each other. After ten or more segments, the slices begin to lose meaning and impact.
  • If necessary, consolidate smaller segments into one segment with a label such as “Other” or “Miscellaneous”.
  • Use a different color or gray scale for each segment.
  • Order the segments clockwise according to size.
  • Make sure the value of all segments equals 100%.
28
Q

are very popular for visualizing correlations, or to show the distribution of many data points. They are also useful for demonstrating clustering or identifying outliers in the data.

A

Scatter Plots

29
Q

What are some examples of Scatter Plots?

A
  • Comparing life expectancy to GDP for each country in a group
  • Comparing the daily sales of ice cream at a given location to the average outside temperature
  • Comparing the weight to the height of each person in a group
30
Q

What are some best practices for scatter plots?

A
  • Label the axes.
  • Make sure the data set is large enough to provide visualization for clustering or outliers.
  • Start the value of the y-axis at zero to accurately reflect the full values of the data. The value of the x-axis will depend on the data. For example, age ranges of ice cream customers might be labeled on the x-axis, and there would be no need to start at zero years of age.
  • If scatter plot shows a correlation between values on the x- and y-axes, consider adding a trend line.
  • Do not include more than two trend lines.
31
Q

is a data point that does not fit the rest of the data. It lies outside of a cluster and does not follow the same pattern.

A

outlier

32
Q

In the data analysis process, outliers that are the result of mistakes can lead to?

A

anomalies

33
Q

why investigating anomalies is a very important part of the data cleaning process?

A

it ensures that data can be analyzed effectively and generate accurate and valid results.

34
Q

With small data sets it may be relatively easy to spot outliers by ?

A

sorting or filtering the data.

35
Q

Two common types of data visualization used to find outliers are ?

A

scatter plots and box plots.

36
Q

is a very powerful data analysis tool within Excel and is great when you need to find information in a large spreadsheet or if you are consistently looking for the same type of information.

A

VLOOKUP

37
Q

VLOOKUP is an abbreviation of ?

A

vertical lookup

38
Q

A VLOOKUP function consists of 4 key pieces of information which are:

A
  1. The value to search for
  2. The range to search in
  3. The column in the range that contains the value you want the function to return
  4. An indication of whether the function should return an approximate match (TRUE, in the function) or only an exact match (FALSE) of the return value. The default for VLOOKUP is an approximate match if FALSE is not specified in the function.
39
Q

Does this make sense?

A

VLOOKUP searches for a value in the leftmost column of a table and, when the value is found, returns information from the same row but in another column.

40
Q

Does this make sense?

A

With XLOOKUP, you can look in any column (not only the leftmost in a table) for a search term and return a result from the same row. One difference is that XLOOKUP defaults to returning an exact match, whereas VLOOKUP defaults to closest match unless the FALSE keyword is used. In this course, you may use either VLOOKUP or XLOOKUP to obtain the desired results if they are both available in the spreadsheet tool you are using.
Note: XLOOKUP is not backward compatible, so worksheets using XLOOKUP may not be usable in earlier versions of Excel.

41
Q

What does vlookup contains?

A

=VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])