Week 2 Flashcards

1
Q

what is data visualisation?

A

the process of displaying data often in large quantities in a meaningful fashion to provide insights that will support better decisions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are the three general principles for visualisation?

A
  • design and layout matter
  • avoid clutter
  • there should be a reason behind using colours and they should be used effectively
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is dashboard?

A

a visual representation of a set of key business measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is a column and bar chart?

A
  • column chart is vertical type of bar charts
  • bar charts are a horizontal type of bar charts
  • A clustered column chart compares values across categories using vertical rectangles
    -a stacked column chart displays the contribution of each value to the total by stacking the rectangles
  • a 100% stacked column chart compares the percentage that each value contributes to a total.
  • Column and bar charts are useful for comparing categorical or ordinal data, for illustrating differences between sets of values,
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is a line chart?

A

it provides a useful means for displaying data over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is a pie chart

A

displays the relative proportion of each data source to the total. it partitions the circle into pie shaped areas showing the relative proportion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is an area chart?

A

combines the feature of a bar chart with those of the line charts, they present more information than pie of line charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is a scatter charts?

A

it shows the relationship between two variables. to construct one, we need observations that consist of pairs of variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is a bubble chart?

A

a type of scatter chart in which the size of the data marker corresponds to the value of a third vaiable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is a statistic?

A

is a summary measure of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is descriptive statistics?

A

refers to methods of describing and summarising data using tabular, visual and quantitative techniques

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is a frequency distribution?

A

a table that shows the number of observations in each of several nonoverlapping groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is a histogram?

A

a geographical depiction of a frequency distribution for numerical data in the form of a column chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how do you form a frequency distribution?

A
  1. the number of groups
  2. the width of each group
  3. the upper and lower limits of each group
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is cumulative relative frequency?

A

The cumulative relative frequency represents the proportion of the total number of observations that fall at or below the upper limit of each group.
- A tabular summary of cumulative relative frequencies is called a cumulative relative ­frequency distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is cross tabulation?

A

a tabular method that displays the number of observations in a data set for different subcategories of two categorical variables

17
Q

what is a population?

A

consists of all items of interest for a particular decision of investigation

18
Q

what is a sample?

A

a subset of a population

19
Q

what is the mean?

A

the sum of the observation divided by the number of observations

20
Q

what are outliers?

A

observations that are radically different from the rest

21
Q

what is the median?

A

the measure of location specifies the middle value when the data are arranged from least to the greatest

22
Q

what is the mode?

A

observation that occurs the most

23
Q

what is the mdirange?

A

thw average of the greatest and least values in the data set

24
Q

what is the range?

A

the difference between the maximum and minimum value in the data set

25
Q

What is interquartile range?

A

the difference between the first and third quartiles

26
Q

what is the variance?

A

the composition depends on all the data. the larger the variance, the more the data are spread out from the mean and the more variability one can expect in the observations.

27
Q

what are the empirical rules?

A
  • The ­percentages are generally much higher than what Chebyshev’s theorem specifies. These are reflected in what are called the empirical rules :
    1. Approximately 68% of the observations will fall within one standard deviation of the mean, or between x - s and x + s .
    2. Approximately 95% of the observations will fall within two standard deviations of the mean, or within x (plus or minus) 2 s .
    3. Approximately 99.7% of the observations will fall within three standard deviations of the mean, or within x (plus or minus) 3 s.
28
Q

what is a standardised value?

A

also known as the z -score , provides a relative measure of the distance an observation is from the mean, which is independent of the units of measurement.

29
Q

what is the coefficient variation (CV)?

A

The coefficient of variation (CV) provides a relative measure of the dispersion in data relative to the mean.
- The coefficient of variation provides a relative measure of risk to return. The smaller the coefficient of variation, the smaller the relative risk is for the return provided. The reciprocal of the coefficient of variation, called return to risk , is often used because it is easier to interpret. That is, if the objective is to maximize return, a higher return-to-risk ratio is often considered better.

30
Q

what is skewness?

A
  • it describes the lack of symmetry of data.
    Those that tail off to the right, like this example, are called positively skewed ; those that tail off to the left are said to be negatively skewed.
  • The coefficient of skewness (CS) measures the degree of asymmetry of observations around the mean.
31
Q

what is proportion?

A

Statistics such as means and variances are not appropriate for categorical data. Instead, we are generally interested in the fraction of data that have a certain characteristic. The formal statistical measure is called the proportion , usually denoted by p.
- they should be between 0 and 1.

32
Q

what is covariance?

A

the measure of the linear association between two variables X and Y

33
Q

what is correlation?

A

measure of the linear relationship between two variables X and Y which doesnt depend on the units of measurement.
- its measured by the correlation coefficient

34
Q

what is probability?

A

the likelihood that an outcome will occur

35
Q

what is the sample space?

A

the collection of all possible outcomes of an experiment

36
Q

what are the two basic factors that govern probability?

A
  1. the probability associated with any outcome must be between 0 and 1
  2. the sum of the probabilities over all possible outcomes must be 1.0
37
Q

what is an event?

A

a collection of one or more outcomes from a sample space
1. The probability of any event is the sum of the probabilities of the outcomes that comprise that event.
2. If A is any event, the complement of A , denoted A sample space not in Ac, consists of all outcomes in the sample space NOT in A.
- The probability of the complement of any event A is P (Ac) = 1 - P(A) .

38
Q

what is a random variable?

A

A numerical description of the outcome of an experiment. Formally, a random variable is a function that assigns a real number to each element of a sample space.
- they can be discrete or continuous and they could be known or empirical

39
Q

what is discrete and continuous variable?

A
  • A discrete random variable is one for which the number of possible outcomes can be counted.
  • A continuous random variable has outcomes over one or more continuous intervals of real numbers.