12 Quantitative Methods - Cluster Analysis Flashcards
Questions on Ettensperger, Felix and Elina Schleutker (2022): “Identification of Cross-Country
Similarities and Differences in Regulation of Religion Between 2000-2014 with Help of Cluster
Analysis”. Politics and Religion 15: 526-558.
(1) Read the introduction: What is the research question?
“comprehensive classification of both democratic and authoritarian countries in 2000 and
2014 when it comes to regulation of religion”
(2) Read the definition of regulation of religion on page 527-528. What are the different
types of regulation of religion?
It is common to distinguish between, on the one hand, government regulation of religion
and, on the other hand, social regulation of religion (i.e., regulation imposed on religious
groups or individuals by non-governmental actors).
As for the government regulation, it is further customary to distinguish between two
different dimensions, namely positive endorsement of and negative restrictions on
religion.
(3) Read the methodology-section.
(3.1) Why do the authors find cluster analysis to be particularly suited for their study?
We find cluster analysis particularly suited for our study, as it enables the classification of
a large number of countries and further makes it possible to study if and how the
clustering of the countries changes over time and when a different set of indicators is
used. Thus, a comparison of the results from various cluster analyses makes it possible
to detect robust cluster patterns within the data, find out which countries change cluster
affinity over time and identify both outliers as well as borderline cases between two
clusters.
(3.2) What are dendrograms and how should they be read (see also question 5.1)?
The clustering is visualized in a dendrogram, showing us the closeness and relationship
between country-cases in our sample. The closer the cases are connected via the
branches of the tree diagram, the higher the similarities between these individual cases
are. This allows us to compare our results not only to quantitative, but also to qualitative
studies in the domain of regulation of religion. We can evaluate if previous observations
about the similarities and differences in state–religion relationships are reflected in the
empirical data by evaluating the distance of cases within clusters.
(3.3) Make a list of the different steps the authors take in their cluster analysis. (You will not understand much of the different steps for now, and should read the next reading by König and Jäckle, which will clarify these technicalities)
Regarding the cluster analysis, the first step of the empirical investigations was to study what the mathematically best number of clusters is. This is important, as the identification of the mathematically optimal number of clusters will minimize the number of countries with low cluster affinity.
The results from each cluster analysis are shown in dendrogram format, which makes it possible to study how the cluster trees are generated, and how the internal structure, the existence of sub-clusters, and the proximity of cases inside of clusters are constituted.
To study the quality of the formed clusters, we employ silhouette analysis (see Rousseeuw 1987). The silhouette width of an individual country can vary between −1 and 1. Values close to 1 indicate a good fit (the country is very similar to the other countries in the cluster), whereas values close to −1 indicate a poor fit (the country is very dissimilar from the other countries in the cluster).
(3.4) Briefly explain where the data comes from and what the case selection is.
The measurement of regulation of religion is based on the third round of the Religion and State project, RAS3 (Fox 2019). The RAS3 dataset includes altogether 36 variables on discrimination against minority religions; 29 types of restrictions on the regulation of and restrictions on the majority religion and all religions and 27 types of non-government discrimination, harassment, acts of prejudice and violence against minority religions. All these variables are coded from 0 to 3.
(5) Read section “results for authoritarian regimes”. In essence, the authors find that it is possible to distinguish between three different groups of authoritarian regimes based on the levels of regulation.
(5.1) Which countries belong to the different groups (to get a detailed list of the countries in each group, you can study the dendrograms)?
Cluster 1 consists of 19 countries mostly located in the MENA region. Cluster 2 is the largest cluster (40 countries, mainly located in Sub-Saharan Africa) with low average levels of regulation. Finally, in the third cluster, we find 18 countries from various geographical locations. With the exception of Myanmar, Syria, and Turkey, all countries in this cluster have made experiences with communist rule.
(5.3) What are the authors’ main results when it comes to the comparison between years and to previous research?
As already mentioned above, the clustering of the countries changes somewhat in 2000 depending on the indicators, which are included in the cluster analysis, whereas in 2014 the clustering is almost the same independently from the included indicators.
A comparison of our results to the studies listed in Table 1 shows that in general, our results are similar to previous attempts to cluster authoritarian countries, and consequently also compatible with the theoretical frameworks, which underline these classifications. In contrast to these previous studies, however, our classification provides the empirically most rigorous findings, demonstrates that the clusters (especially in 2014) are relatively stable independently from which indicators are studied and allows us to identify countries, which are borderline cases between two clusters and thus difficult to classify.
Questions on König, Pascal D. and Sebastian Jäckle (2017): “Clusteranalyse”. In: Jäckle, Sebastian (ed.): “Neue Trends in den Sozialwissenschaften Innovative Techniken für qualitative und quantitative Forschung“. Springer VS. (pages to read: 51-84).
(1) Read sections 1 and 2. What is cluster analysis and what is it good for?
Inductive, to seek and build group structures based on data. The method is useful to see how data can get classified into various groups based on qualitative and/or quantitative differences. The groups are meant to be rather homogeneous and identifiable in comparison to other groups, so they are likely to vary significantly from each other. So the groups are not identified beforehand but are rather determined based on the data and how the data can be grouped.
There are different types of clustering, as there is no unique form of it, consisting of the hierarchical and the partitioned (k-means) methods.
(2.1) What is the difference between agglomerative clustering (bottom-up) and divisive-clustering (top-down)?
Both, agglomerative (bottom-up) and divisive (top-down) are part of the hierarchical clustering methods. Agglomerative clustering is the “melting” (grouping?) one after each other of individual observations. When a whole cluster consisting of all observations is split into single objects, it’s a divisive clustering method.
(2.2) What is the purpose of similarity measures (Ähnlichkeitsmaßen) in clustering?
There are similarity measures (Ähnlichkeitsmaßnahmen) and distance measures which are measured for comparing pairs of objects with interesting features. This allows to assess if two elements are so similar to be put into the same cluster. There are types of similarity measures such as Matching-coefficient, Phi-coefficient, Rogers-Tanimoto.
(2.3) The authors describe several clustering methods, which are illustrated in Table 3. Try to make sense of each of these methods, and their purpose.
Single Linkage: minimum distance between two links
complete linkage: longest distance between two links
average linkage: average distance between two links
(2.4) What are the strengths and weaknesses of hierarchical clustering?
There are no firmly established criteria for hierarchical clustering thus it remains more explorative. This makes it more flexible and more open to interpretation but on the other side it makes it more complex and less clear. It is usually limited to a maximum of several hundred objects, thus a smaller data base, nevertheless, this also means that several outliers can highly affect the dendogram and its structure
(2.5) What is a dendrogram and what does it show?
Dendogram: based on distance / similarity measures, it shows the “clustering levels” of the objects and the connection between the objects and groups. Because each object can only be allocated to one superordinate group, there is a hierarchical structure. The leaves of the groups shows cluster homogeneity. The longer, the more homogeneous. The optimum number of clusters are found across the horizontal line, when all cut leaves have a high enough distance to the next “melting point”. Example shows that the best would be a two-cluster solution, followed by a four-cluster solution.
(3.1) What is k-means clustering and does it differ from hierarchical clustering?
It is a variant of partitioned clustering. While in hierarchical clustering, the amount of clusters are determined afterwards based on the analysis process, in the partitioned clustering the clusters are pre-determined with ideally its own means.