Statistics Flashcards
What is variance?
Disperzija/varijansa predstavlja matematicko ocekivanje odstupanja slucajne promenljive od njene srednje vrednosti.
Varijansa je mera disperzije, sto znaci da izrazava koliko je skup brojeva rasiren od njihove prosecne vrednosti.
Variance (sigma^2) is the measure of how far from the mean is each value in a dataset. The higher the variance, the more spread the dataset. This measures magnitude.
Calculation:
The average of the squared differences from the Mean for all data points.
What is covariance?
Kovarijansa u teoriji verovatnoće i statistici, predstavlja meru jačine veze između promene dve promenljive.
Covariance is the measure of how two random variables in a dataset will change together. If the covariance of two variables is positive, they move in the same direction, else, they move in opposite directions. This measures direction.
What is correlation?
Korelacija je medjusobni odnos dve ili vise slucajnih promenljivih (na osnovu vrednosti jedne slucajne promenljive, uz odredjenu verovatnocu, mozemo da pretpostavimo vrednost druge).
Kovarijansa je mera jacine povezanosti dve sl. promenljive.
Koeficijent korelacije je mera stepena poveznosti dve slucajne promenljive.
Correlation is the degree to which two random variables in a dataset will change together. This measures magnitude and direction. The covariance will tell you whether or not the two variables move, the correlation coefficient will tell you by what degree they’ll move.
What Is a Normal Distribution?
A normal distribution, also called Gaussian distribution, is one that is symmetric about the mean. This means that half the data is on one side of the mean and half the data on the other. Normal distributions are seen to occur in many natural situations, like in the height of a population.
In a graph, normal distribution will appear as a bell curve.
The mean, median, and mode are equal
All of them are located in the center of the distribution
68% of the data falls within one standard deviation of the mean
95% of the data lies between two standard deviations of the mean
99.7% of the data lies between three standard deviations of the mean
Ocekivanje je 0, varijansa (sigma^2) je 1.
srednja vrednost = medijana = modus
What are the different types of Hypothesis testing?
Hypothesis testing is the procedure used by statisticians and scientists to accept or reject statistical hypotheses.
Null hypothesis: It states that there is no relation between the predictor and outcome variables in the population. H0 denoted it.
Example: There is no association between a patient’s BMI and diabetes.
Alternative hypothesis: It states that there is some relation between the predictor and outcome variables in the population. It is denoted by H1.
Example: There could be an association between a patient’s BMI and diabetes.
Explain the Type I and Type II errors in Statistics?
Greske u testiranju hipoteze.
Greška tipa I se dešava kada se odbaci istinita nulta hipoteza, odnosno ukoliko se prihvati neistinita alternativna hipoteza.
Greška tipa II nastaje kada prihvatamo netačnu nultu hipotezu, odnosno odbacimo istinitu alternativnu hipotezu.
In Hypothesis testing, a Type I error occurs when the null hypothesis is rejected even if it is true. It is also known as a false positive.
A Type II error occurs when the null hypothesis is not rejected, even if it is false. It is also known as a false negative.
Descriptive vs. Inferential Statistics
Deskriptivna statistika se bavi opisivanjem prikupljenih podataka dobijenih prilikom ispitivanja ili merenja, kao i njihovim sredjivanjem i sazimanjem (graficki prikazi, aritmeticka sredina, standardna devijacija).
Inferencijalna statistika sluzi analizi uzoraka i pronalazenju pravilnosti ili razlika unutar ili medju
uzorcima i omogucuje izvodjenje zakljucaka (obuhvata proveravanje postavljenih hipoteza upotrebom statistickih testova).
Descriptive Statistics describes the characteristics of a data set. It is a simple technique to describe, show, and summarize data in a meaningful way. Also, an experiment is conducted on the entire population
Inferential statistics involves drawing conclusions about populations by examining samples.
Mean vs. Median vs. Mode
The mean of a dataset represents the average value of the dataset.
The median represents the middle value of a dataset.
The mode of a set of values is the most frequently repeated value in the set.
When do we use mean?
Kada imamo simetricnu raspodelu bex outliera.
It’s best to use the mean to describe the center of a dataset when the distribution is mostly symmetrical and there are no outliers.
When do we use the median?
It is best to use the median when the distribution is either skewed or there are outliers present.
What is Sample Size?
Sample size is the measure of the number of individual samples used in an experiment.
What is Standard Deviation?
Govori nam koliko u proseku elementi skupa odstupaju od aritmeticke sredine skupa.
It’s a measure of how spread out the data is.
A square root of variance.
Quantitative vs. Qualitative Data
Qualitative = gender, color, car type… (pie charts)
Quantitative = numbers (bar graphs)
What is a scatterplot?
It’s used to visualize the relationship between data that comes in pairs (two variables).
What is Population?
Populacija je skup svih clanova koji imaju odredjenu, zajednicku karakteristiku; skup svih ljudi ili stvari koji su od interesa u odredjenom istrazivanju.
Velicina populacije je odredjena brojem svojih clanova
The entire group of subjects about which we want information.