Ch 8 Sallis Flashcards
What are outliers?
extreme values that deviate from what is typical for the variable
What is coding errors?
illogical values in the dataset
How is coding errors and outliers detected?
By going through the data. Graphical representations like frequency distributions and various descriptive statistics are helpful in finding them
If an outlier is not a coding error, what should you do with it?
The answer depends on what kind of analysis you intend to perform. If you are only showing frequency tables and distributions, you do not need to do anything. Outliers do not influence the analysis; they are simply part of the data presentation. On the other hand, if more advanced estimation techniques are used, you must at least be aware of the adverse effects that outliers can have on the results. If you are in the fortunate situation of having a large dataset and the number of outliers is relatively small, you should probably consider removing the cases where the outliers occur. Here, common sense and argumentation play an important role.
When does missing values occur?
Missing values occur when there is no observation recorded in one or more cells in a dataset. They leave holes that have not been assigned a number value. When data comes from questionnaires, missing values are simply due to the respondent not answering all the questions or an error in data input.
What are three basic possibilities to deal with missing values?
- Listwise deletion, which is to omit all cases with missing values. This works well when there is very little missing data relative to the sample size.
- Pairwise deletion retains more data than listwise by using cases where data is available for each pair of variables in the analysis. Where listwise removes all cases with any missing data, pairwise only removes cases when the data is missing between each pair of variables being analyzed.
- Impute (replace) missing values with a neutral value. There are several types of imputation. For example, the neutral value may be the average of the non-missing observations in the variable or it may be based on a pattern present in the data.
Which are the two basic methods for estimating how reliable an empirical measurement is?
a) one examines the consistency of a measurement over time. In the second case, b) one examines the consistency of different measures that measure the same thing. For questionnaires, it is the internal consistency of different questions that are meant to measure the same thing at a given time. This is especially relevant when measuring attitudes using multiple questions.
Describe consistency over time
Two variants are usually distinguished when it comes to mapping consistency over time: the test-retest method and the alternative form method. The test-retest method is to conduct a new survey in which the same questions are asked to the same people shortly after the first survey was administered. We then calculate the correlation etween the responses to the two questionnaires. The correlation coefficient, which can range from 1 to +1, is a quantitative expression of the reliability of the study (correlation is explained later in this chapter). With perfect reliability, the correlation coefficient will be +1. In this case, the results of the two studies may not be identical, but they will be consistent. In other words, if the relationship between the results of the two surveys is presented in a diagram, the answers will be on a straight line.
Although this method has an intuitive appeal, there are several reasons why it is not often used. It costs money and takes time. When respondents are asked to re-answer the same survey, they may answer based on remembering their previous answers instead of actually thinking about the questions. Another problem, of course, is that their opinion may have changed in the short time between surveys. For example, their attitude towards a job position may have quickly changed if other prospects became available. This means that the correlation can be low without being due to low reliability. In fact, changes in the underlying variables that we wish to identify may simply be due to the respondents taking part in a survey. For example, if we ask employees about how happy they are with a particular employer, this can lead to a thought process where they re-evaluate their perception of that employer.
The alternative form method differs from the test-retest method by using different questions each time the data is collected. The questions used the second time are assumed to cover the same theoretical properties or variables as the questions used the first time. The advantage of the alternative form method is foremost to reduce the risk that the respondent’s memory will affect the correlation, and thus the estimate of reliability. The problem with the method is that it can be difficult to construct different questions that express the same underlying variable.
What is Cronbach’s Alpha?
Cronbach’s alpha is the most widely used measure of reliability, often referred to as simply, alpha (α).
Cronbach’s Alpha is a statistical measure used to assess the reliability or internal consistency of a set of items (e.g., questions in a survey or test) that are meant to measure the same concept or construct.
What it tells us:
- It checks how closely related the items are as a group.
- A high Cronbach’s Alpha means the items work well together to measure the same thing.
The scale:
- Values range from 0 to 1.
- Above 0.7: Generally considered acceptable.
- Above 0.8: Considered good.
- Above 0.9: Excellent but could mean the items are too similar and might need simplification.
- Below 0.7: Indicates the items might not be reliable together.
Example:
If you create a survey with 10 questions to measure customer satisfaction, Cronbach’s Alpha will tell you if those questions consistently reflect satisfaction or if some might not belong in the group.
In simple terms, Cronbach’s Alpha is like a quality check to see if your survey or test items “stick together” to measure what you want them to.