Week 2 Flashcards
what is data visualisation?
the process of displaying data often in large quantities in a meaningful fashion to provide insights that will support better decisions.
what are the three general principles for visualisation?
- design and layout matter
- avoid clutter
- there should be a reason behind using colours and they should be used effectively
what is dashboard?
a visual representation of a set of key business measures
what is a column and bar chart?
- column chart is vertical type of bar charts
- bar charts are a horizontal type of bar charts
- A clustered column chart compares values across categories using vertical rectangles
-a stacked column chart displays the contribution of each value to the total by stacking the rectangles - a 100% stacked column chart compares the percentage that each value contributes to a total.
- Column and bar charts are useful for comparing categorical or ordinal data, for illustrating differences between sets of values,
what is a line chart?
it provides a useful means for displaying data over time
what is a pie chart
displays the relative proportion of each data source to the total. it partitions the circle into pie shaped areas showing the relative proportion
what is an area chart?
combines the feature of a bar chart with those of the line charts, they present more information than pie of line charts
what is a scatter charts?
it shows the relationship between two variables. to construct one, we need observations that consist of pairs of variables
what is a bubble chart?
a type of scatter chart in which the size of the data marker corresponds to the value of a third vaiable
what is a statistic?
is a summary measure of data
what is descriptive statistics?
refers to methods of describing and summarising data using tabular, visual and quantitative techniques
what is a frequency distribution?
a table that shows the number of observations in each of several nonoverlapping groups
what is a histogram?
a geographical depiction of a frequency distribution for numerical data in the form of a column chart
how do you form a frequency distribution?
- the number of groups
- the width of each group
- the upper and lower limits of each group
what is cumulative relative frequency?
The cumulative relative frequency represents the proportion of the total number of observations that fall at or below the upper limit of each group.
- A tabular summary of cumulative relative frequencies is called a cumulative relative frequency distribution.
what is cross tabulation?
a tabular method that displays the number of observations in a data set for different subcategories of two categorical variables
what is a population?
consists of all items of interest for a particular decision of investigation
what is a sample?
a subset of a population
what is the mean?
the sum of the observation divided by the number of observations
what are outliers?
observations that are radically different from the rest
what is the median?
the measure of location specifies the middle value when the data are arranged from least to the greatest
what is the mode?
observation that occurs the most
what is the mdirange?
thw average of the greatest and least values in the data set
what is the range?
the difference between the maximum and minimum value in the data set
What is interquartile range?
the difference between the first and third quartiles
what is the variance?
the composition depends on all the data. the larger the variance, the more the data are spread out from the mean and the more variability one can expect in the observations.
what are the empirical rules?
- The percentages are generally much higher than what Chebyshev’s theorem specifies. These are reflected in what are called the empirical rules :
1. Approximately 68% of the observations will fall within one standard deviation of the mean, or between x - s and x + s .
2. Approximately 95% of the observations will fall within two standard deviations of the mean, or within x (plus or minus) 2 s .
3. Approximately 99.7% of the observations will fall within three standard deviations of the mean, or within x (plus or minus) 3 s.
what is a standardised value?
also known as the z -score , provides a relative measure of the distance an observation is from the mean, which is independent of the units of measurement.
what is the coefficient variation (CV)?
The coefficient of variation (CV) provides a relative measure of the dispersion in data relative to the mean.
- The coefficient of variation provides a relative measure of risk to return. The smaller the coefficient of variation, the smaller the relative risk is for the return provided. The reciprocal of the coefficient of variation, called return to risk , is often used because it is easier to interpret. That is, if the objective is to maximize return, a higher return-to-risk ratio is often considered better.
what is skewness?
- it describes the lack of symmetry of data.
Those that tail off to the right, like this example, are called positively skewed ; those that tail off to the left are said to be negatively skewed. - The coefficient of skewness (CS) measures the degree of asymmetry of observations around the mean.
what is proportion?
Statistics such as means and variances are not appropriate for categorical data. Instead, we are generally interested in the fraction of data that have a certain characteristic. The formal statistical measure is called the proportion , usually denoted by p.
- they should be between 0 and 1.
what is covariance?
the measure of the linear association between two variables X and Y
what is correlation?
measure of the linear relationship between two variables X and Y which doesnt depend on the units of measurement.
- its measured by the correlation coefficient
what is probability?
the likelihood that an outcome will occur
what is the sample space?
the collection of all possible outcomes of an experiment
what are the two basic factors that govern probability?
- the probability associated with any outcome must be between 0 and 1
- the sum of the probabilities over all possible outcomes must be 1.0
what is an event?
a collection of one or more outcomes from a sample space
1. The probability of any event is the sum of the probabilities of the outcomes that comprise that event.
2. If A is any event, the complement of A , denoted A sample space not in Ac, consists of all outcomes in the sample space NOT in A.
- The probability of the complement of any event A is P (Ac) = 1 - P(A) .
what is a random variable?
A numerical description of the outcome of an experiment. Formally, a random variable is a function that assigns a real number to each element of a sample space.
- they can be discrete or continuous and they could be known or empirical
what is discrete and continuous variable?
- A discrete random variable is one for which the number of possible outcomes can be counted.
- A continuous random variable has outcomes over one or more continuous intervals of real numbers.