27. Biostatistics I: stochastic variable and probability distribution, normal distribution and its parameters. Estimations of expected value and standard deviation from sample. Flashcards

1
Q

What is experiment?

A

The way of collecting data or accessing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is data?

A

qualitative or quantitative properties.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is qualitative data? Examples?

A

Categorical data that can be sorted into categories

→ for example, the names of the diseases, types of pathogens, or the severity of the condition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is quantitative data?

A

Numerical data that can be characterized by a number

→ E.g, The size of the rash or the duration of the sickness can be expressed by a number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

2 types of qualitative data

A
  1. Ordinal data (sortable)
  2. Nominal data (not sortable)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

example of ordinal (sortable) data

A

The severity of the disease: modest, medium, strong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Example of Nominal (not sortable) data

A

the blood groups: A, B, AB, 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

2 types of quantitative data

A
  1. Continuous data (e.g., weight, height, blood pressure).
  2. Discrete data (e.g., number of children in the family).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Classification of data

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is absolute frequency?

A

the number of times a particular piece of data or a particular value appears during a trial or set of trials.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is relative frequency?

A

The frequency that equals the absolute frequency of the category divided by the total number of cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Relative frequencies are always numbers between _ and _

A

between 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Does measured data always have errors?

A

YES!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The final goal of the statistical methods is to draw __.

A

conclusions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Logical inference gives __

A

a statement that is 100 % sure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Statistical inference gives __

A

a statement of given probability (always less than 100 %).

(For example, if we state something with 95 % probability it means that in 5 cases out of 100 we were wrong, the inaccuracy is 5 %.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the reason of inaccuracy?

A

in case of statistical inference

→ we are not able to take all of the circumstances into account.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does Probability calculus give?

A

a mathematical description of laws of mass events in the material world that are not determined unambiguously by the circumstances.

(statistics are based on the principles of probability calculus.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a continuous parameter? Give an example

A

a numeric parameter that can take any value in a specified interval.

→ E.g, The pulse rate, the frequency of heartbeat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is population (fundamental ensemble)?

A

a set of all possible observations (from Multiple measurements)

→ The number of elements in a population is N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is variable?

A

Observation of a specific feature of the population

→ i.e, a single general element x of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a sample?

A

An appropriate part of the population chosen for the examination → to draw the conclusions about the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does sampling mean?

A

It means choosing n elements, ideally randomly, from the population.

Sampling happens, for example, when we measure several times. Sampling makes sense only if n can be much smaller than N (see Fig. 3).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is frequency distribution?

A

A (list) table of representation of frequencies or relative frequencies in classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is histogram?

A

The graph consists of a series of rectangles, each with an area proportional to the frequency of data in the corresponding class interval represented on the horizontal axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

histogram.

Equal class widths are convenient, because in this case the frequency is proportional to the __ of the rectangle.

A

height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

histogram.

Every small rectangle (or square) corresponds to (1)___

→ The total number of rectangle units equals ___

→ This is the total __

A
  1. one measured value
  2. the total number of measurements (n = 20).
  3. area under the frequency curve.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

histogram.

If the data size is increased and at the same time the class width (1)___

→ then the rough steps of the envelope observed imitially gradually smooth into a (2)___ curve (Fig. 6.).

A
  1. decreased
  2. continuous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Distribution of the population, theoretical distribution curve

Let us have a closer look at the tendency shown in Fig. 6.

→ If the population consists of a finite number of elements (N ), then upon increasing the number of the sample elements (n) the sample size will eventually reach the (1)___

→ the sample will contain (2)____

A
  1. population size
  2. all the elements of the population (n = N ).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Distribution of the population, theoretical distribution curve

Let us have a closer look at the tendency shown in Fig. 6.

the distribution of a sample with N elements yields ___

A
  1. population size
  2. all the elements of the population (n = N ).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Distribution of the population, theoretical distribution curve

Let us have a closer look at the tendency shown in Fig. 6.

What is theoretical distribution?

A
  1. population size
  2. all the elements of the population (n = N ).
32
Q

Distribution of the population, theoretical distribution curve

Let us have a closer look at the tendency shown in Fig. 6.

→ What does the population distribution provide?

A

the probabilities of all the possible values of the variable

33
Q

Distribution of the population, theoretical distribution curve

Let us have an interval (a, b) on the coordinate axis.

→ The probability that a randomly chosen value falls within the interval (a, b) equals __

A

the area that lies under the distribution curve in this interval (from a to b)

34
Q

Distribution of the population, theoretical distribution curve

If in the interval (a, b) the distribution curve has small values, the area under the curve is (1)___

→ the corresponding probability of incidence of these values of the variable will be (2)___ (Fig. 7/1)

A
  1. small
  2. low
35
Q

Distribution of the population, theoretical distribution curve

if in the interval (a, b) the distribution curve has large values

→ the (1)___ and the (2)___ will be high (Fig. 7/2).

A
  1. area
  2. corresponding probability
36
Q

Distribution of the population, theoretical distribution curve

If the width of the interval is (1)___, the area under the curve increases, which means (2)___ probability of incidence for these values (Fig. 7/3).

A
  1. increased
  2. higher
37
Q

Distribution of the population, theoretical distribution curve

Similarly to the histograms, the total area under the curve equals ___

→ because the “interval” (a,b) which in this case spans ___ contains any randomly chosen value for sure.

A
  • 1
  • from –∞ to +∞
38
Q

Distribution of the population, theoretical distribution curve

Consequently, in case of continuous variables probability that a randomly chosen value exactly matches a given number is ___.

A

zero

39
Q

Difference between theoretical distribution and histograms?

A
  • The theoretical distribution → describes all the possible data (i.e., the population)
  • the histogram concerns only the elements of a sample taken from the population (i.e., the data of the specific measurement).
40
Q

how we obtained the theoretical distribution?

A

the number of elements in the sample, thus the number of measured data was increased.

41
Q

What is the principal theorem of mathematical statistics?

A

in case of large samples, the empirical distribution function (i.e., the envelope of the histogram) approximates very well the theoretical distribution function.

the more frequently data occur within a certain interval in the sample, the more probable is the appearance of these values in the population as well.

42
Q

The sample has to be representative with respect to the ___

A

population

43
Q

A characteristic parameter of a population is determined by ____

A

mathematical statistics through the examination of only a certain number (preferably few) of its elements.

44
Q

Sampling means choosing the elements to be examined (the sample) in a way that enables us later to draw reliable conclusions (inferences) about the whole population.

This is usually achieved by ___

A

random selection of sample elements

45
Q

Can the theoretical distribution have different shapes?

A

Yes

46
Q

What is Gaussian distribution?

A

A type of theoretical distribution which has a symmetric bell-shaped curve with one peak

It has following parameters

  • Expected value (µ)
  • Theoretical standard deviation 􏰤
47
Q

What is this equation?

A

The mathematical expression of the Gaussian distribution function

48
Q

Is the normal or Gaussian distribution a single distribution function? Why?

A

No

→ Due to its parameters it describes a whole family of them: the shape of the curves is similar, but their position, width and height may vary

49
Q

Gaussian distributions

the curve descends at the extremes toward the horizontal axis

→ Does the curve touch the horizontal axis?

A

Although the curve descends at the extremes toward the horizontal axis, it never actually touches it, no matter how far out one goes

50
Q

What are the parameters of Gaussian distribution?

A
  • Expected value (µ)
  • Theoretical standard deviation 􏰤
51
Q

What is GAUSSIAN CURVE?

A

Histogram envelope of a population with Gaussian distribution if N = ∞ and 􏰦x → 0

→ if the population contains an infinite number of elements and the class width approaches 0.

52
Q

Fig. 9. The bell-shaped Gaussian distribution and its parameters.

Identify 1. Give the definition

A

Theoretical standard deviation

→ measure of the width of the Gaussian curve.

→ Width of the Gaussian curve at half height is approximately 2􏰤􏰧 times of Theoretical standard deviation

53
Q

Fig. 9. The bell-shaped Gaussian distribution and its parameters.

Identify 2

A

Inflexion points

54
Q

Fig. 9. The bell-shaped Gaussian distribution and its parameters.

Identify 3. Give the definition

A

Expected value

→ one of the parameters (􏰢) of the distribution, a value that corresponds to the maximum of the Gaussian curve.

→ In other words, it is the value above or below which half the cases fall, and it is also the point that divides the area under the curve in half.

55
Q

Is the height of the Gaussian curve an independent parameter?

A

No

56
Q

What is the standard normal distribution?

(second curve from the left on the Fig. 8).

A

Among the infinite number of possible normal distributions there is a special one, for which …. (image)

57
Q

What is the central limit theorem of probability calculus?

A

the values that are influenced by many little and independent effects follow a normal distribution.

(→ This explains why the majority of variables occurring in nature are normally distributed.)

58
Q

Estimation of the parameters, statistical properties of the sample

The expected value is estimated most often by ___

A

the mean

59
Q

What is the mean?

A

the arithmetic mean (average*) of the data (elements of the sample)

→ the most stable central tendency measure of the distribution

→ the least sensitive to the change of the sample

60
Q

What makes the mean of high importance?

A

the sum of all deviations from this number equals zero (because the sum of the negative deviations will be equal to the sum of positive deviations):

61
Q

If we imagine the data spread across a board according to their values

→ the mean corresponds to ___

A

the position of the balance point of the distribution.

62
Q

What does this expression mean?

A

the empirical standard deviation (s),

63
Q

What is the empirical standard deviation (s)?

A

Estimate of the theoretical standard deviation 􏰤(from the squares of the deviation of the points from the mean.)

64
Q

What is this expression defined as?

A

variance

→ the square of the empirical standard deviatio

65
Q

What is variance?

A

The square of the empirical standard deviation

66
Q

What is this expression?

A

the degree of freedom.

67
Q

What is the degree of freedom?

A

the number of independent members or sample elements.

→ By subtracting the number of related elements from the total sample size we obtain the degree of freedom, that is, the number of independently chosen elements.

( Evidently, every element cannot be independent if there is a relationship between them. )

68
Q

What are two types of error?

A

inaccuracy and distortion.

69
Q

What is inaccuracy?

A

the error which causes a random deviation from the true value in either the positive or the negative direction.

70
Q

What is distortion?

A

The error which causes the estimated value of the parameter to be systematically smaller or larger than the “true’’ value of the parameter.

71
Q

Can inaccuracy and distortion be measured?

A

Whereas inaccuracy can be estimated, distortion cannot.

72
Q

as the number of sample elements approaches infinity,

→ the mean approaches the ___

→ the empirical standard deviation approaches ___

A
  1. expected value
  2. the theoretical standard deviation with higher and higher accuracy.
73
Q

The empirical standard deviation s is the measure of (1)___ of the data

→ It gives the (2) ___

A
  1. variability
  2. average deviation of the data from the mean
74
Q

What is reference range or normal range?

A

The interval calculated from a large number of data contains exactly 95% ion the elements of the sample

75
Q

What is stochastic variable?

A

A random variable (stochastic variable) in statistics whose possible values depend on the outcomes of a certain random phenomenon.