Hypothesis Testing, PCA & t-SNE Flashcards
How does a test become reliable?
By examining the consistency of results across time, among various observers, and throughout different sections of the test. A valid measurement is an accurate result from a test which should be reproducible.
What is a control group?
A control group is a group that is subjected to normal conditions, that is a test has been performed on a population, the control group was under normal conditions.
Where does random assignment fail to deal with?
Random assignment is sometimes impossible because the experimenters cannot control the treatment or independent variable. You can’t assign subjects to these groups at random, for example, if you want to see how people with and without depression do on a test.
Do we need to randomize the treatments while experimenting?
When we are conducting the test, we expect the population to have as much randomness and equal representation of all kinds of people as possible.
Differentiate between control group and treatment group?
The treatment group receives the treatment whose effect the researcher is interested in. The control group receives either no treatment, a standard treatment whose effect is already known.
For the medical domain, how do we adjust the alpha value?
For instance, a medical scientist who develops a new treatment that may revolutionize the management of an illness and replace a standard therapy must be very certain that the new approach is superior to the old one. Due to the potential impact on the field and the negative consequences of making wrong decisions, it’s very important to take a conservative approach before claiming a difference. In this case, reducing the chance of making a type 1 error is more important and making a type 2 error is more acceptable because this would suggest no change in medical treatment. This can be accomplished by the use of a more stringent alpha level, such as 0.01 or 0.001.
For less critical research decisions, decreasing the chance for a type 2 error is more appropriate. This can be accomplished by using a more liberal alpha level, say 0.1 which makes it easier to reject the null hypothesis. For example, lets say that a researcher wants to compare two hand soaps, both known to work, to see which one cleans better. Does it really matter if it is concluded that one is better than the other if in fact there is no major difference between them? Probably not, so in this case a type 1 error is more acceptable. In summary, its the responsibility of the researcher to decide which error is the less important and to see the alpha level accordingly.
How do we know if we have the proper sample size for the experiment?
The effect size (typically the difference between two groups), the population standard deviation (for continuous data), the required power of the experiment to detect the postulated effect, and the significance level are all parameters that must be known or calculated to calculate the sample size.
What happens if the random selection is skewed?
Our results may be harmed if our data is biased. To use skewed data, we must apply a log transformation to the entire set of values to discover patterns and develop the data useable for the statistical model.
Is it true that all patients have the same level of cancer/stage, or is that, too, randomized?
Even though each person’s condition is unique, cancers of the same sort and stage generally have comparable outlooks. When doctors discuss a patient’s cancer, the cancer stage is also a means for them to define the degree of the disease.
Why is it necessary to know how many persons refused to participate in the study?
Our objective is to monitor the changes if mammography is introduced as a part of health insurance. So when it is introduced, some people will refuse it. Hence we have to take everything into account and see the results.
Do you always need equal sample sizes in control and treatment? or what is the threshold of how different they can be?
It is not necessary to have an equal number of samples in each group, we can easily take percentages and ratios with any number but it is easier to work with equal sample sizes.
How can we know if your experiment is biased or not?
There are so many ways a bias can be introduced and few of them are different measurements, sampling, etc. We use multiple hypothesis testing and corrections to avoid it.
What is the p-value?
We calculate the p-value for a data set using hypothesis testing. If the p-value is less than the significance level then the result is statistically significant and we can reject the null hypothesis. Here, the p-value for the mammography study is 0.012 which is less than the significance level of 0.05 and hence we can conclude that the result is statistically significant and offering mammography reduces the death rate due to breast cancer.
What is the distinction between the T and Z statistics?
When you don’t know about the population standard deviation, you use a T-Test using a T Statistic instead of a Z score. The main difference between a Z score and a T statistic is that the population standard deviation must be estimated.
Should Null and Alternative be mutually exclusive and complementary?
Not necessarily, they will be mutually exclusive, but they don’t have to be exhaustive and complimentary.
Is there any way to anticipate how often a type 1 error will occur given a significance threshold of 0.05?
A significance level of 0.05 indicates that you are willing to accept a 5% chance that you are wrong when you reject the null hypothesis.
Is it true that some variations are too minor to matter if your sample size is large and you can detect a small difference?
The hypothesis test develops higher statistical power to identify tiny effects as the sample size grows. With a big enough sample size, the hypothesis test can identify effects that are so small that they are almost of no significance.
We reject null, but is there proof that the treatment works?
Hypothesis testing is used to determine whether there is sufficient evidence to reject the null hypothesis. To put it another way, we’re looking to see if there’s enough evidence to rule out the null hypothesis. We cannot reject the null hypothesis if there is insufficient evidence.
Should we compare the results of changing alpha to 0.01 vs. 0.05?
Reducing the alpha level from 0.05 to 0.01 lowers the risk of a false positive (also known as a Type I error), but it also makes it more difficult to identify differences using a t-test. As a result, any important results you acquire would be more reliable, but there would be fewer of them.
Is there a chance that we don’t have any substantial false positives?
If our test gives a p-value of 0, it means the test is statistically significant and the null hypothesis is rejected (for example the differences between your groups are significant).
Whatis the definition of covariate?Is it the same as having variables that are confounding?
Confounders are variables that are related to both the intervention and the outcome, Covariates are variables that explain a part of the variability in the outcome.
What’s the distinction between PC1 and PC2?
Each principal component is an eigenvector. PC1 (Principal Component 1) captures the majority of the variability in the data, while PC2 (Principal Component 2), which is orthogonal (independent) to PC1, captures the less variability than PC1 and so on.
How are the principal components formulated?
Eigenvectors and eigenvalues are the linear algebra concepts that we need to compute from the covariance matrix of the original data to determine the principal components of the data.
Are PC1 and PC2 always pointing in opposite directions?
They are always orthogonal to each other.
Are the system’s eigenvectors essentially the primary components?
Yes, they are eigenvectors of the covariance matrix.
What does the term “greatest variation” mean?
Variance means the amount of information carried from the original features. The PCA tries to capture the maximum amount of variance.
Will you be able to identify PC1 and PC2? What do they represent in the dataset?
PC1 and PC2 will often represent a “significant” percentage of variability, but never 100%. Each principal component is a linear combination of the original features and we can plot the plot the 2 or 3 principal components (using 2D or 3D plots) to visualize the same.
What does the PCA stand for?
PCA stands for Principal Component Analysis It is used to visualize data in lower dimension and reduce the number of dimensions.
Do we know, which variables are included in PC 1?
Each principal component uses all the features. It is a linear combination of all the features available.
What is normalization?
Normalization is the process of converting the values of numeric columns in a dataset to a similar scale without distorting the ranges of values. Every dataset does not need to be normalized for machine learning. It is only required when the ranges of characteristics are different.
Can PCA be used for finding the anomalies in the data?
By examining available features to identify what makes a “normal” class, the PCA-Based Anomaly Detection component solves the problem. After that, the component uses distance metrics to detect anomalous cases. This method allows you to train a model with data that is already skewed.
In PCA, how do you account for variables with a lot of interacting effects?
If N variables are highly correlated then they will all be a part of the same Principal Component (Eigenvector), not different ones. This is how you identify them as being highly correlated.
Why is the normal distribution referred to as a Gaussian distribution?
Because the probability density graph of the normal distribution resembles a bell, it is commonly referred to as the bell curve. The Gaussian distribution is named after the German mathematician Carl Gauss, who first characterized it.
For Gaussians, what will mean and standard deviationbe?
A probability bell curve is referred to as a normal distribution. The mean and standard deviation are the parameters of the normal distribution that defines its shape and the center.
The mean of a ‘standard normal distribution’ is 0 and the standard deviation is 1. It has a kurtosis of 3 and zero skew. Although all symmetrical distributions are normal, not all normal distributions are symmetrical.
What is the t-SNE algorithm?
The t-SNE algorithm does not keep track of distances, but it does estimate probability distributions. The t-SNE algorithms, in theory, map the input to a two- or three-dimensional map space. The mapped space is supposed to be a t-distribution, while the input space is considered to be a Gaussian distribution. The KL Divergence between the two distributions is employed as the loss function, which is reduced using gradient descent.
When do we use t-distribution?
You must use the t-distribution table when working problems when the population standard deviation (?) is not known and the sample size is small (n<30). General Correct Rule: If ? is not known, then using t-distribution is correct.
Is it possible to restore a previously run t-SNE?
The t-SNE technique may not necessarily provide similar results on subsequent runs, and the optimization process has additional hyperparameters.
How to deal with categorical variables in PCA?
While it is technically possible to use PCA on discrete variables, or categorical variables that have been one hot encoded variable, you should not. Simply put, if your variables don’t belong on a coordinate plane, then do not apply PCA to them.