Final Flashcards
Relationship between INCOME (in $) and CONTINENT of birth could be analyzed using an F-test
TRUE
Relationship between HEIGHT (in cm) of respondents and CONTINENT of birth could be analyzed using an Chi Squared test
False
Relationship between HEIGHT of respondents (in cm) and SEX could be observed using a Scatter Diagram
False
Relationship between HEIGHT of respondents (in cm) and WEIGHT (in Kg.) could be observed using a Box Plot
False
If we BIN two scale variables HEIGHT of respondents (in cm) and WEIGHT (in Kg) we would get ORDINAL versions that could be analyzed using a crosstab
True
t-Test is usually the right option to explore bivariate relationship if we have a scale variable and a categorical variable with more than two categories
False
ANOVA test is sometimes followed by a Post - Hoc Test (Bonferroni, LSD
True
The Null Hypothesis for an F- test is that the mean of a scale variable is the same across different categories of a categorical variable
True
ANOVA is one of the usual inferential tests that complements a SCATTER X/Y graph
False
If we get a p-value of 0.04 in a bivariate INDEPENDENCY test, that means that we have evidences of relationship with a 96% of maximum confidence
True
CLUSTER is an SUPERVISED CLASSIFICATION method
False
Cluster analysis is mainly used to aggregate of CASES, not FIELDS
True
Cluster is one of the main technical resources for PREDICTIVE ANALYTICS
False
Cluster is one of the main technical resources for PREDICTIVE ANALYTICS
True
The evaluation of a cluster solution is NOT MAINLY a technical assessment task
True
We normally STANDARDIZE metric and categorical variables to run a cluster analysis
False
We normally run a HIERARCHICAL cluster only if the number of cases / individuals is relatively small
True
We can use a HIERARCHICAL cluster combining metric and categorical variables
False
The AGGLOMERATION schedule is not a piece of interest in a TWO Step Cluster
True
The choice of the distance measure in hierarchical clusters depends, basically, on the type of variables (categorical, scale,…)
True
One of the advantages of Two-Step is its inherent ability to handle outliers
True
Normally, a good cluster solution is shaped with a LARGE number of clustering variables (not less than 15)
False
A field/variable can be very relevant to define/ distinguish a SPECIFIC CLUSTER, without being of great importance for the solution as a whole
True
The selection of the clustering VARIABLES highly conditions the cluster solution we get
True
Factor analysis Is a SUPERVISED analysis technique
false
Factor analysis It is normally used when we want to summarize a numerous group of variables into a lower number of factors
True
Factor Is commonly used as a technique to reduce the number of cases in a dataset
False
We could use a FACTOR analysis to test if different items in a questionnaire are linked with the same latent/underlying concept
True
Factor analysis Can be used to create a COMPOSITE INDEX
True
A low specificity for most of the inputs variables may suggest that Factor analysis is feasible
True
Standard/Classic factor analysis is suitable for metric and categorical variables
False
Using PCA extraction method, FACTORS will be always orthogonal if no rotation is applied
True
Rotation is used to improve factor interpretation
True
Oblique rotation is normally realistic since factors are normally correlated
True
Sometimes, we will find of interest to retain factors with eigenvalues lower than ONE
True
The factor SCORES will all have zero mean
True
Prior to rotation, a variable may exhibit high correlation with more than one factor at the same time
True
Correlation between input variables should always be positive if we want to carry a good factor analysis
False
In a good factor analysis, we expect to get as few factors as possible accounting for as much variance as possible
True
A low level of communality for a variable means that we will need a specific factor for that variable in our analysis
True
Given that input variables are used in standardized fashion, the sum of all the eigenvalues of all factors equals the number of variables
True
When there exists a common underlying factor for every variable we will get a high eigenvalue for the first factor
True
Trees Is a SUPERVISED analysis technique
true
Trees It is normally used when we want to balance “causation understanding” Vs prediction
True
CHAID uses iterated F-Test to grow the tree
False
One of TREES drawbacks is that is a bit hard to interpret because of the technical complexity of output
False
It its prone to over-fitting and thus, it requires a careful evaluation
True
CHAID is able to use scale variables automatically
True
The more we split the nodes, the more we avoid the over – fitting risk
False
It is a more flexible than regression when causality relationships are not uniform acroos all our sample
True
Trees. It is normally used for scale targets
False
We could use the result of a FACTOR analysis as input for a given CLUSTER analysis
True
We could use the result of a FACTOR analysis as input for a given REGRESSION analysis
True
We could use the result from a CLUSTER analysis as an input in a given TREE Analysis
True
We could use the result from a CLUSTER analysis as an input in a given standard FACTOR analysis
False
We could use the result from a CLUSTER analysis as an input in a given REGRESSION analysis without any transformation
False
We could use STANDARD REGRESSION in order to explain the result of a CLUSTER analysis
False
We could use STANDARD REGRESSION in order to explain a given FACTOR score with a set of explanatory variables
True