Advancing into Data Analytics with excel, Python and R Flashcards
What is a variable in data analysis?
A variable is any characteristic, number, or quantity that can be measured or quantified.
How are variables classified based on measurement levels?
Variables are classified as nominal, ordinal, interval, or ratio.
Define a nominal variable.
A nominal variable is a categorical variable without any intrinsic ordering, such as gender or color.
What distinguishes an ordinal variable from a nominal variable?
An ordinal variable has a clear, ordered relationship between its categories, like rankings or grades.
Give an example of an interval variable.
Temperature measured in Celsius or Fahrenheit, where the difference between values is meaningful.
Why is a ratio variable different from an interval variable?
A ratio variable has all the properties of an interval variable, but also has a true zero point, allowing for the expression of ‘twice as much’ comparisons. Examples include height and weight.
What is the purpose of exploratory data analysis (EDA)?
EDA aims to summarize the main characteristics of data, often using visual methods, to uncover patterns, spot anomalies, and test hypotheses.
Name two common data visualization techniques used in EDA.
Histograms and scatter plots.
How does a histogram help in data analysis?
A histogram displays the distribution of a dataset, showing the frequency of data points in successive intervals.
What insight does a scatter plot provide?
A scatter plot reveals the relationship or correlation between two continuous variables.
Define descriptive statistics.
Descriptive statistics are summary statistics that quantitatively describe or summarize features of a dataset.
What are measures of central tendency?
Measures that describe the center of a data set; commonly mean, median, and mode.
Explain the difference between mean and median.
The mean is the average of all data points, while the median is the middle value when data points are ordered.
What are measures of variability in descriptive statistics?
Measures that describe the spread or dispersion of data; include range, variance, and standard deviation.
Why is standard deviation important?
Standard deviation quantifies the amount of variation or dispersion in a set of values, indicating how spread out the data points are around the mean.
How can Excel be used in exploratory data analysis?
Excel provides tools like PivotTables, charts, and functions to perform data summarization, visualization, and basic statistical analysis.
Describe the role of PivotTables in Excel.
PivotTables allow users to reorganize and summarize selected columns and rows of data in a spreadsheet to obtain a desired report.
What is the benefit of using Excel charts in EDA?
Excel charts help visualize data trends and patterns, making it easier to interpret and communicate findings.
How does understanding variables aid in data analysis?
Knowing the types of variables helps in selecting appropriate analysis methods and statistical tests.
What is the significance of data visualization in EDA?
Data visualization simplifies complex data sets, making it easier to identify patterns, trends, and outliers.