Chapter 10 Flashcards
What is big data?
Big data refers to a group of data that is too large to be processed through conventional methods, characterized by high volume and velocity of collection, and variety in type and quality.
Nominal Variables
A nominal variable’s categories that have no ordering, existing in name only (i.e. ‘grapes’, ‘oranges’, ‘bananas’, etc).
Ordinal variables
Ordinal variable categories have a specific ordering (i.e. ‘agree’, ‘neutral’, ‘disagree’)
What are the two broad types of variables? What is the difference?
Quantitative: numerical
Categorical: name/category
What are the two types of categorical variables?
Nominal: unordered, exist in name only
Ordinal: part of a set order
What are the two types of quantitative variables?
Continuous: infinite along a continuum of values; typically real numbers; typically measurements;
Discrete: values are finite within a range, typically integers. Usually represent countable items/groups of items.
What is data visualization? Why does it matter?
When data is displayed in a visual format meant to more easily convey information to people, such as a chart or graph.
Data shown in a text-only format often does not convey ideas or information very clearly.
What is cardinality?
Why must you consider this?
The number of unique elements in a data set.
Consider cardinality when choosing a visual method by which to represent data, as some methods (such as pie charts) are only suited to low-cardinality, while others (such as charts) may be better for high cardinality.
What is an example of a data visualization method suitable for low-cardinality?
What about high-cardinality?
Pie charts are an example of data visualization suitable for low-cardinality.
Scatter plots and histograms are example of visualization suitable for higher-cardinality.
What is a good visual method for representing categorical data?
Bar graphs
What command would import a module “pandas” module using the alias “pd”?
import pandas as pd
What is matplotlib?
What does it replicate?
matplotlib is a module used for plotting data in Python.
It replicates the capabilities of MATLAB, an engineering-oriented programming langauge.
This matplotlib function specifies the title of a plot:
plt.title()
This matplotlib function specifies the x-axis label of a plot:
plt.xlabel()
This matplotlib function specifies the y-axis label of a plot:
plt.y-label()