C14 understanding quant data Flashcards
planning analysis - what is quant analysis? how does it relate to design?
Likely to go into the analysis stage with fairly solid ideas about what you are looking for. Nature of quant is that you have a clear idea about concepts you want to measure, questions wanted to address and hypotheses wanted to test. You will have thought about this in the deciding sampling frame, who and how many, and how to ask them. Quality of analysis is based on quality of problem definition, research design stages, quality of earlier stages. Outcome of analysis will be of much better quality if the problem was clearly defined and that research designed would deliver evidence that would help to address clients business problems.
planning analysis - reviewing materials
Likely to go into the analysis stage with fairly solid ideas about what you are looking for. Nature of quant is that you have a clear idea about concepts you want to measure, questions wanted to address and hypotheses wanted to test. You will have thought about this in the deciding sampling frame, who and how many, and how to ask them. Quality of analysis is based on quality of problem definition, research design stages, quality of earlier stages. Outcome of analysis will be of much better quality if the problem was clearly defined and that research designed would deliver evidence that would help to address clients business problems.
May be useful to reacquaint yourself with a brief, sampling plan and questionnaire. Brief will provide business problems and information needed to address it, which you must not lose sight of whilst analysing. When reviewing brief you should ask:
Why research is needed How findings are to be used Research objectives What aim of research is - explore, describe, explain, evaluate What are working hypotheses or ideas?
In tackling analysis you are looking for information in data - meaningful insights - that will allow clients to make informed decisions.
Sampling plan will tell you who you need to look at - which groups / types of people. Questionnaire is in effect a map or index of data that you have to address research objectives. Use both in conjunction with brief to look at what data you need to look at by which groups, comparisons to be made and to what end.
Planning analysis - benefits of it, secondary data
Analysis strategy will help to stretch resources; will take you through mass of data in a systematic and rigorous way. One that meets requirements set out in brief, will make tasks entirely efficient. Strategies should not be set in stone, data may throw up interesting or unexpected findings, and it is acceptable to explore these in relation to research objectives.
May be useful to revisit secondary data sources, initial background / secondary research for particular study or body of existing knowledge / literature; it will give ideas or help develop thinking and analysis. May be useful to look at well-developed models and theories which can be a source of inspiration and should help, but used critically.
Once an analysis plan is in plan, you should get to know data and start working through and reorganizing to suit your purposes.
Understanding data - concepts, questions and variables
Measuring in a research context can mean gathering data on relevant things. This may be straight forward e.g. age or may be conceptual e.g sexism. A valid and reliable measure of a concept has to start with an examination of it, agreeing definition of it and what dimension of it that is relevant in research objectives, and establish which indicators will be used to measure this. Final, question was designed - this process can be called an operationalizing concept. Response format had to be decided e.g. age into 4 bands of which can be statistical tests on these groups. Back at qnn stage you would have been thinking ahead to analysis to make these decisions.
At the analysis stage conventional practice is to refer to questions designed as variables and responses as values of variables. Important thing to note at this point in connection between questions and variables, and link back to concept you are to measure as well as link between choice of question / response format and its impact on what you can do with the analysis.
Case, variables and values
A complete individual unit of analysis is called a case - typically one questionnaire, record of interview with one respondent is one case. 300 completed qnn is 300 cases. To identify each individual case a unique number, serial number, is assigned. For each case individual bits of information are called variables, and answers the respondents give to these questions are variables. The process of responses being assigned numbers is called coding. Coding means that an answer, response to a question, is converted into a number value that the analysis programme can read.
Data entry
When qnn is administered or completed computer aided (CAPI, CATI or online by respondents), the process of data entry - moving responses from qnn to data file - is done automatically. If you are using paper qnn this must be done using data entry; for an analysis programme to read data it must be in a regular, predictable format. For most datasets the data usually appears in a grid arrangement - sort seen in spreadsheet or analysis packages such as SPSS. grid is made up of rows of cases and columns of variables. These number codes are what you or the data entry programme transfer from qnn into analysis program in a process known as data entry / data input. Packages also allow alphanumeric codes, these are called string variables.
Typically, frequency counts will be converted into a percentage calculated on the most suitable base for a particular question, all answering or total sample. You can ask in DP spec or when you write table specifications that both percentage and frequency count / raw number appear on tables.
Levels of measurement - what are they?
Nominal scale numbers - used to classify or label things. Other symbols would be just as suitable but numbers are used as they are familiar.have arithmetic meaning or value.
Ordinal scale numbers - represent category / indicate that there is a relationship between the numbered items. In other words there is ranking / order / sequence to numbers e.g. house numbers are ordinal numbers; position in race or birth order in family e..g first secomnd third are ordinal rankings. Ordinal numbers do not represent a real amount, so arithmetic is not meaningful.
Interval scale numbers - represent measurement numbers or values, so arithmetic. Numbers in interval scale are ordered and intervals between numbers are of equal size. Temperature is based on interval scale. There is no absolute zero - negative amounts mean something e.g. -5 degrees. Income is an example of an interval level variable.
Ratio scale numbers - same properties as interval scale numbers, have a rank order, equal intervals, arithmetic is meaningful, but on ratio scale there is an absolute zero. Zero on ratio scale means that there is nothing there - whereas on interval scale zero might mean low or very low. Examples are time, weight, number of times an item has been used or number of children in HH.
Why do levels of measurement matter?
Interval and ratio level variables can be manipulated using a range of mathematical and statistical procedures, as they represent numeric amounts and arithmetic is meaningful with these types of numbers. Nominal and ordinal level variables are not suitable for this. To determine what kind of analysis is appropriate, type of statistical test to use when testing hypotheses, it is important to recognise what kind of number or variable you have. Different tests are suitable for different levels of measurement.
Editing and cleaning dataset - why?
Either as data is being entered or afterwards they are edited / cleaned to make sure they are free of errors and inconsistencies e.g. missing values, out of range values, errors due to misrouteing.
Missing values - why occur? how to avoid in design?
Blank responses. Can occur as:
Question may not apply to respondent
Respondent may not know answer
Respondent refuses to answer
Interviewer forgot to record response
Missing value must be dealt with to avoid contaminating the dataset or misleading researcher / client. Adding a DK or N/A can prevent this at qnn design stage and at interviewer briefing sessions. Interviewers should be briefed on how to code these. Possible to avoid missing answers by checking respondents at the end of the interview or in quality control call-backs.
Missing values - how to deal with them?
If missing values remain, a code can be added to the data entry program that allows missing value to be recorded. Typically code is chosen with a value that is out of range of possible values for that variable. Another option, extreme one, if casewise deletion in which you remove cases that contain missing values. Result in reduction of sample size and may lead to bias, as cases with missing values may differ from those where there is none. Less drastic approach is pairwise deletion in which only cases without missing values are used in the table or calculation for specific questions. May also replace missing value with a real one - two ways of approaching this:
calculate mean from variable and use that
Calculate an imputed value based on either pattern of response of respondents with similar profiles to respondents with missing value.
inconsistencies , routing errors and out of range values
Resolving problems in inconsistencies, routing instructions not followed correctly, extreme answers and answers that are not valid or are outside the range of possible answers. Incorrect routing should not happen if CAPI where routing is automatic, and programme alerts of inconsistent answers and refuses answer codes that are out of range. Further checks on accuracy and consistency of data can be made at the next change of process, when data is available in the form of a frequency or holecount.
Once data has been entered, edited and verified they are in form that they can be manipulated or analysed.
Manipulation of variables
Some variables may not be in a form that is useful for further analysis. Possible to change variables / values by recording them or manipulating them into new variables. If a variable is at interval / ratio level of measurement you can use arithmetic functions to create new values based on values of original value.
Types of data analysis? what is data analysis?
Purpose of the project has been to answer questions raised by clients in wanting to explore, describe, count, explain, understand or evaluate an issue or problem relevant to their business problem. Now at the point of being able to answer these problems, under the assumption that research questions were relevant to research problems and appropriate research design was undertaken).
Four types of analysis
Univariate descriptive
Bivariate descriptive
Explanatory
Inferential
What sampe for inferential analysi?
May in the course of the project use one or more types of analysis. Inferential analysis depends on which type of sampling you used. That is, whether you used probability or non-probability. Reason for using probability / random sample is to generalise sample to population - estimate whether what you see in sample exists in population from which sample was drawn. If you use this kind of sampling you can use inferential analysis to make inferences.
Univariate descriptive analysis?
Analysis that describes one variable - basic but useful and informative type of analysis. Purpose oh which is often to help to get to know data. It involves summarising or describing reposes using frequency counts and frequency distributions, and calculations known as descriptive statistics - measures of central tendency (averages) and measures of spread or variables.
Frequency counts and skews
Count the number of times a value occurs in the dataset, typically the number of respondents that gave an answer. Useful to run frequency counts before detailed analysis or table spec as it gives overview to a question, allowing you to see size of sub-groups within sample, what categories of responses might be grouped together, and what weighting may be required. Can decide whether it is feasible to isolate certain groups to look at how attitudes, behaviour or opinion differ from other groups.
Can also be used to look at graphical display of frequency on what is know as a frequency distribution chart; plot range of values on X axis and frequency on Y axis, allowing you to quickly and easy see spread of values for particular variables. Useful way of describing shape of distribution or continues of metric variables. If it is symmetrical (bell curve, normal distribution), there is no skewness in either direction, mean, median and mode will be the sane. When distribution is skewed, asymmetrical, they will not be the same value. Positively skewed population jas a greater proportion of values lying above mean, negative is opposite.