G5. Descriptive Statistics Flashcards

1
Q

What is the role of descriptive statistics with regard to the analysis of data collections?

A

to visualize and summarize the sample distribution, thereby allowing us to make tentative assumptions about the population distribution. providing a concise and meaningful summary of key characteristics of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

What type of questions can be answered using descriptive statistics? Which are the mathematical tools used for that?

A

Factual queries, summarizing and presenting the main features of the data. Central Tendency, Variability or Dispersion, Distribution Shape, Frequency and Proportions, Percentiles and Quartiles, Correlation and Relationships, Summary Measures, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which methods are provided by Python Pandas for getting acquainted with data collections content in a quantitative manner? What about R?

A

Python:
head() and tail():
info():
describe():
shape():
dtypes():
value_counts():
corr():
isnull(), sum(), heatmap()

R:
head() and tail()
str()
summary()
dim()
class()
table()
cor()
is.na(), sum(), and heatmap()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is the method shape used for analysing data in a DataFrame? Is there an equivalent in R?

A

Purpose: Returns the number of rows and columns in a DataFrame.
Usage: df.shape
R: dim()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What issues have to be considered in order to be able to apply statistics to raw data collections?

A

Data Quality
Data Scale and Units
Sampling Bias
Data Transformation
Statistical Assumptions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the role of the generation of graphics in the application of descriptive statistics for analysing data?

A

Data Exploration
Pattern Recognition
Outlier Detection
Correlation and Relationships
Distribution Analysis
Communicating Results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which are the strategies used for dealing with dirty data when applying descriptive statistics functions?

A

Data Cleaning (Handling Missing Values, Correcting Errors)
Outlier Detection and Handling (visual or statistical)
Data Transformation (Logarithmic, normalisation)
Handeling duplicates
Formating
Categorical data (dummy)
Data imputation
Cross validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  • Why can the distribution of the values of a given attribute be important to be known in a data analytics process?
A

It provides a basis for descriptive statistics, aids in data exploration, and guides subsequent analytical steps. A thorough understanding of the distribution enhances the accuracy and reliability of insights gained from the data analytics process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly