Project Prep Benchtest Flashcards
Research question characteristics
- Focused on a single problem
- Researchable using primary/secondary sources
- Feasible to answer within the timeframe and practical constraints
- Specific enough to answer thouroughly
- Complex enough to devlop the answer over a space of a paper or thesis
- Relevant to your field of study/or society more broadly
Types of Research questions
- Descriptive research
- Comparative research
- Correlational research
- Exploratory research
- Explanatory research
- Evaluation research
- Action research
What is in a problem statement?
- Context
- Specific issue being investigated
- Why this problem? Why now? Currency?
- Set objectives (project goals)
Descriptive research
What are the characteristics of X?
Comparative research
What are the differences and similarities between X and Y?
Correlational research
What is the relationship between variable X and variable Y?
Exploratory research
What are the main factors in X? What is the role of Y in Z?
Explanatory research
Does X have an effect on Y? What is the impact of Y on Z? What are the causes of X?
Evaluation research
What are the advntages and disadvantages of X? How well does Y work? How effective or desirable is Z?
Action research
How can X be acheived? What are the most effective strategies to improve Y?
S.M.A.R.T
Specific, Measurable, Attainable, Realistic, Timely
Inductive vs Deductive research
Developing a theory vs testing a theory
Exploratory vs Explanatory research
Exploring the main aspects of problem vs explaining causes and consequences of a well defined problem
Academic critique
- Deep dive into a single body of work
- Should be a counter argument - need to use external evidence and give counter points
Positivist
- Objective study
- Reductionist (break down complexities into simpler units of study)
- Verifying theories
- Can be studied in isolation
Critical Theorist
- Knowledge used to empower people
- Participatory
- Seeks to bring about change
- Focus on empowering groups
- Studied within that context
Constructivist
- Truth is relative to context
- Theory is open to interpretation
- Generates theories in a given context
- Cannot be studied in isolation
Pragmatist
- All research is biased
- No objective ‘truth’
- Works towards pratical solutions to problems
- Multiple answers
- Seek the best one(s)
Reliability
- How consistent are repeated measurements
- How close together are the measurements
Validity
Results correspond to the real thing
Types of reliability assessments
- Test-retest
- Inter-rater
- Internal
Types of validity assessments
- Construct
- Face
- Concurrent
- Predictive
Test-retest
-Determines reliability of the test and results over time
- Good indicator of reliability is strong correlation (r > 0.8) between same test given to same subjects over time
- Only works on consistent attributes
Inter-rater
- Determines reliability of test measurements and results gathered by different researchers
- Different people should give strongly correlated results
Internal
- Do you get same results if you use different tests to measure the same thing
- Strong correlation supports reliability
Construct (Validity assessment)
Does the test relate to high level theories
Face (Validity assessment)
Does test appear to test what it aims to test
Concurrent (Validity assessment)
- Does the test relate to an existing similar validated test
- Work is built on findings of another test and matches their work
Predictive (Validity assessment)
Does the test predict performance in a later developed test
Research Ethics
Concerns the responsibility of researchers to be honest and respectful to all individuals who are affected by their research studies or their reports of the studies’ results
Research integrity
Conducting research in a way that allows others to have trust and confidence in the methods used and findings that result from this
Bias
Conscious or unconscious influencing of the study and its results
Types of Bias
- Recall bias
- Selection bias
- Observation bias
- Confirmation bias
- Publishing bias
Recall Bias
- Survey respondents asked to recall events
- different types of events more likely to be remembered than others
Selections bias
Samples can sometimes under-represent certain people and over represent others
Observation bias
- Hawthorne Effect
- When participants are aware that they’re being observed they, either consciously or unconsciously, alter the way they act or the answers they give
Confirmation bias
- Occurs during interpretation of study data
- Researchers consciously or unconsciously look for information or patterns that confirm the ideas or opinions that they already hold
Publishing bias
- Studies with negative findings (nothing found) are less likely to be submitted by scientists or published by journals
- Perceived as less interesting
Avoiding bias
- Bias in per-course survey (unbalanced data) - automatic profiling
- Bias in learning about user instead of type of user (stereotyping) - different users in training and test sets
- Bias in future data predicting past - train on past, test on future
- Bias in unbalanced data sample - stratified sampling
Literature review
- A survey of scholarly sources on a specific topic(s)
- Provides an overview of current knowledge allowing you to identify relevant theories methods and gaps in existing research
Review article
- Summarises current state of understanding on a topic
- Surveys and summarises previously published studies - rather than report on new facts or analysis
- Gives roadmap on future research
- Can be used to back up the validity of your question
Surveys
Any method focused on asking Participants for responses
Purpose of Surveys
- Gather information not available from other sources
- Ubiased representation of population interest
- Collect information from many individuals to understand them as a whole
- Allows massive information gathering
Type of data collected by surveys
Mainly quantitative but qualitative methods can be used too
Pros of Surveys
- Can get info from large samples
- Can have different types and numbers of variables
- Gets info that’s hard to observe
- Easy and cheap
- Standardised stimulus - no observer subjectivity
Cons of surveys
- Intentional misreporting to hide inappropriate behaviour
- Poor recall
- Response rates are critical
- Can introduce bias from wording of questions
- Inflexible - can’t be changed during data gathering
- Not ideal for controversial issues
Survey Types by purpose
- Exploratory - form general ideas about research questions
- Descriptive - collect more specific descriptions of the variables of interest
- Explanatory - develop understanding of relationships among variables of interest
How can you validate surveys?
- Need to validate bias in question design
- Ask positive and negative questions - should be given opposite answers
- Validity of survey comes from the representativeness of the sample and the precision of the questions
- Face validity - Do questions appear reasonable and acquire data you want
- Content validity - Are questions all about issue and other subjects related to it
- Internal validity - Do questions imply the outcome you want to achieve
- External validity - Do questions elicit answers that are generalizable
Survey - Research questions
- Correlational questions
- Less technical questions - usability
- Exploratory questions
Types of sampling
- Random sampling- each member has equal chance of being picked
- Stratified sampling- use subsets of the population to sample - lower sampling error
- Systematic sampling- every Nth name is selected
- Quota sampling- researcher chooses necessary number of participants per stratum
- Purposive sampling- researcher selects participants according to criteria
Purpose of Observation
To understand how people naturally interact with products and people and the challenges they face
Pros of Observation
- Can get more subtle data
- Allows richly detailed description
- Viewing or participating in unscheduled events
- Improves quality of data collection
- Can see things you weren’t expecting
- Useful for formulating hypothesis
- Doesn’t depend on information provided by respondents
- Can deal infants/animals
Cons of Observation
- Less structured responses
- Get huge amount of data - analysing and not including bias is hard
- Difficult to replicate - lots of variables you don’t have control of
- Different researchers gain different understanding of what they observe
- Male/female researchers have access to different information
- Many events are uncertain in nature - difficult for researcher to determine time and place
- Can’t generalise
- Long and expensive
Observation - Research questions
- Exploratory
- Explanatory
Type of data collected by Observation
Typically qualitative but can be quantitative
Types of Observation
- Complete observer
- Observer as Participant
- Participant as Observer
- Complete Participant
Complete observer
- Detached observer
- Researcher is neither seen nor noticed by participants
- Minimises Hawthorne effect - participants more likely to act natural
- Most likely to raise ethical questions
Observer as participant
- Researcher is known and recognised by participants
- Participants know research goals of the observer
- Some interaction with participants but limited
- Researchers aim is to play a neutral role
Participant as observer
- Researcher is fully engaged with the participants
- More of a friend or colleague than neutral third party
- Full interaction with participants but they still know its a researcher
Complete participant
- Fully embedded researcher
- Observer fully engages with the participants and partakes in their activities
- Participants aren’t aware that observation and research is being conducted
How do you validate an observational study?
Use multiple independent researchers to observe
Direct Observation
- Quantitative technique
- Explicitly counting the frequency and/or intensity of specific behaviours
- Most direct observation data collection done by actual observers
- Don’t require human data collector - audio/video can be used
- Ordinal data/ purely factual description
- Structured form of data collection
Participant observation
- Process enabling researchers to learn about the activities of the people under study in the natural setting through observing and participating in those activities
- Qualitative, interactive and unstructured
- Information collected is unique to the individual collecting the data
Purpose of Interviews
Explore the views, experiences, beliefs and/or motivations of individuals on specific matters
Purpose of Focus Groups
- Group of respondents are interviewed together
- Obtain data from purposely selected group of individuals rather than representative sample
Pros of Interviews
- Can get qualitative data
- Preferable when researcher wants subjective perspective rather than generalisable understandings
Cons of Interviews
- Time consuming
- Not the best for researching sensitive topics
Pros of Focus Groups
- Better at drawing people out of their shells - increased validity
- Allows for discovery
- Can build on each others comments for richer contextual data
Cons of Focus Groups
- Time consuming
- Anonymity is hard
- Less reliable
- Participants can be influenced by other group members - conformity, social desirability, oppositional behaviours
- Need skilled interviewer to prevent these problems
Interviews and Focus Groups - Research questions
- Exploratory questions
- Theory testing/creation questions
- Confirmatory research questions
Type of data collected by Focus Groups and Interviews
- Almost always qualitative
Structured vs Unstructured questions
Structured:
- Quantitative method
- Closed-ended questions
- List of questions
- Everyone asked same questions in the same order
- Easy to replicate
- Easy to test for reliability
- Quick to conduct
- Not flexible
Unstructured:
- Do not use any set questions
- Guided discussion
- Most useful for qualitative research
- Rarely provide valid basis for generalisation
- More flexible
- Increased validity - can probe for deeper understanding
- Time consuming to conduct and analyse the data
- Employing and training interviewers is expensive
Semi-Structured:
- Set questions but can investigate answers more
- Gets qualitative and quantitative data
- Can explore around answers
- Gathers useful info but respondents can answer more on their own terms
- More flexible
- More time-consuming
Types of Focus Groups
- Dual moderator - Two moderators
- Two-way - Two seperate groups having discussions at the same time - second group listens to the firs tbefore having teh discussion
- Mini- 4-5 participants instead of 6-10
- Client-involvement - clients ask for focus group and invite those who ask
- Participant-moderated- one or more participants are moderators
- Online
Purpose of experiments
Allows researchers to look at cause-and-effect relationship
Used when:
- There is time priority in a causal relationship
- There is consistency in a causal relationship
- The magnitude of the correlation is great
Pros of experiments
- Allows for reproducibility
- Generalisation is easier
- Can take bias into account using statistics
Cons of experiments
- Equipment might be more expensive
- Highly prone to human error
- Errors can reduce validity
- Eliminating real-life variables can result in inaccurate conclusions
- Time-consuming process
- Researchers can control variables to suit personal preferences
- Results are not descriptive
Experiments - Research questions
Correlational questions
Type of data collected by experiments
Quantitative
True experiment
Researcher manipulates one variable and controls the rest of the variables
Ad hoc analysis
Hypothesis invented after testing is done to try and explain contrary evidence
Independent variable
variable manipulated
Dependent variable
variable measured
Control variables
not changed
Purpose of Secondary data analysis
- Take data from previous research and examine it for new question
- Look for datasets that other people have created
Pros of secondary data analysis
- Discover new things from old data
- Can use data that you wouldn’t have the resources to gather
- Access to historical data
- Ease of Access
- Inexpensive
- Time-saving
Cons of secondary data analysis
- May be issues with the data e.g bias
- Might twist yourself to fit the data you’ve got
- If you don’t know how the data is collected - don’t know the validity
- Because data is hugely heterogeneous in many cases - have to make decisions to remove, ignore or add sections - can lead to confirmation bias
- Many critical decisions in processing the data
- Irrelevant Data - have to find the relevant data from the irrelevant data
Secondary data analysis - research questions
- Often explorational
- Every question can be asked
What can go wrong in data cleaning
- Because data is hugely heterogeneous in many cases - have to make decisions to remove, ignore or add sections - can lead to confirmation bias
- Need to know a lot about the data to prove that any changes in adding or ignoring have valid assumptions and rationale
How can you validate secondary data analysis
To validate secondary data, find the:
- Purpose for which the material was collected/created
- Specific methods used to collect it
- Population studied and validity of the sample
- Ccredibility of the collector
- Limits
- Historic and/or political circumstances
- And consider how the data is coded/categorised
- Consider whether data must be adapted/adjusted
Quantitative data - Research questions
- Correlational
- Causation
- The how questions
Qualitative data - Research questions
- The why questions
Mixed approach
- Mix of qualitative and quantitative data
- Usually use different methods to collect them
- When you have a small sample size - want to do quantitative but don’t have enough people
- Qualitative used to underpin quantitative
- For exploration
Quantitative data
- Expressed in numbers and graphs
- Used to test or confirm theories and assumptions
- Can be used to establish generalisable facts about a topic
- Methods include experiments, observations recorded as numbers and surveys with closed-ended questions
- At risk for research biases icl. Information bias, omitted variable bias, sampling bias or selection bias
Qualitative research
- Expressed in words
- Used to understand concepts through experiences
- Gather in-depth insights on topics
- Methods include interviews with open-ended questions, observations described in words, focus groups, Ethnographies and literature reviews
- At risk of research biases incl. Hawthorne effect, observer bias, recall bias and social desirability bias
Qualitative data limitations
- Don’t draw samples from large-scale data sets due to time and costs involved
- Problem of adequate validity or reliability is major concern due to subjective nature
- Contexts, situations, events, conditions and interactions cannot be replicated
- Generalisations can’t be made to a wider context than the one studied
- Lengthy time required
- Expert knowledge of an area is required to interpret the data
Qualitative data advantages
- Researcher gains an insider’s view of the field - can find issues that are often missed
- Can be important in suggesting possible relationships, causes, effects and dynamic processes
- Allows for ambiguities/contradictions in the data which reflect social reality
- Uses a descriptive, narrative style
Quantitative data limitations
- Do not take place in natural setting
- Do not allow participants to explain their choices
- Poor knowledge of the application of the statistical analysis may negatively affect analysis and subsequent interpretation
- Large sample sizes needed for more accurate analysis
- Confirmation bias - researcher might miss observing phenomena because of focus on theory or hypothesis testing rather than on theory/hypothesis generation
Quantitative data advantages
- Scientific objectivity - data can be interpreted with statistical analysis
- Useful for testing and validating already constructed theories
- Data analysis and collection can be performed quickly
- Data can be checked by others and replicated
- Hypotheses can be tested
Hypothesis testing
Collect data to determine if a claim about the population is true
Hypothesis
-Testable statement that you want to accept or reject
- You never “prove” a hypothesis
Validity of a hypothesis
- Needs to be testable
- Need to be able to prove it false
- Be specific - don’t use ambiguous words e.g “athlete” or “better”
- Don’t be too specific - overlap with methodology
“If (one variable) ‘is related to’/’is affected by’/’causes’ (other variables) then (comment on relationship)”
Alternative hypothesis tails
- Two tailed test - doesn’t state direction
- One-tailed test - states direction
Type 1 error
Null Hypothesis is true but is rejected - false positive
Type 2 error
Null hypothesis is false but is not rejected - false negative
P-value
- Compare p-value to a threshold value (significance level/alpha) to reject null hypothesis
- P > alpha - fail to reject
- P <=alpha - reject
Critical value
- Some tests return a list of critical values and their associated significance levels and a test statistic
- Test statistic < critical value - fail to reject
- Test statistic >= critical value - reject
Types of data
- Observational data
- Experimental data
- Simulation data
- Dervived/Compiled data
Observational data
Open surveys, observational studies, focus groups etc. …
Experimental data
Collected via experimentation - easier to reproduce
Simulation data
Scenario simulation allows for generation of predictive data
Derived/Compiled data
Utilises existing data to generate new data - secondary data analysis
Descriptive analysis
- Basic analysis of the data giving a general overview
- Only describes what the data is or what it shows
- Allows for simple analyses
- No extrapolation of inference
- Measures of frequency
- Measures of central tendency
- Measures of dispersion or variation
- Measures of position
Measures of frequency
Count, percent, frequency
Measures of central tendency
- Mean, median, mode
- Used to show an average or most commonly indicated response
Measures of dispersion or variation
- Range, variance, standard deviation
- Variance/standard deviation - difference between observed score and mean
- When you want to show how spread out the data is
Measures of position
- Percentile ranks, Quartile ranks
- Describes how scores fall in relation to one another
- Relies on standardised scores
- Use when you need to compare scores to a normalised score
Exploratory Analysis
- Examine or explore data and find relationships between variables which were previously unknown
- Does not describe the cause
- Useful for discovering new connections
Inferential Analysis
- Use statistics to look beyond the collected data to identify new conclusions
- Using a small sample of data to infer about a larger population
- Based on laws of probability and confidence intervals
- Central Limit Theorem
- T-test
Central Limit Theorem
- Distribution sample means approximates a normal distribution and the sample size gets larger, regardless of populations distribution
- Average of sample means and standard deviations will equal the population mean and standard deviation
T-test
- Tells how likely the difference between two groups is a real difference rather than sampling artefact
- ‘P-value’ - probability that the data collected occurs by random chance
Predictive Analysis
- Using historical or current data to find patterns to make predictions about the future
- Simulations can both generate data for prediction as well as using existing data
- Accuracy of predictions depends on input variables/data
- Accuracy depends on types of models - linear model generally works well
- Using variable to predict another doesn’t denote a causal relationships
Causal Analysis
- Step beyond inferential analysis
- Examines the cause and effect relationships between variables focused on finding the cause of a correlation
- Generally large, complex and expensive studies
- Four important components
1. Correlation
2. Temporal sequence - cause must occur before effect
3. Concomitant variation - variation must be systematic between the two variables
4. Nonspurious association - Any covariation between a cause and an effect must be true and not due to another variable
Mechanistic Analysis
- Similar to predictive but instead of general data driven predictions - utilise highly specific changes in variables that lead to changes in linked variables
- Generally used in high precision disciplines e.g engineering and physics
- Often used in high precision computer models
5 characteristics of quality data
- Validity - degree to which data conforms to defined business rules or constraints
- Accuracy - ensure data is close to true values
- E.g put in positive and negative questions in questionnaire - person should answer 1 to the negative if they answered 5 to the positive - Completeness - degree to which all required data is known
- Consistency - ensure data is consistent within the same dataset/ across multiple datasets
- Uniformity - degree to which data is specified using the same unit of measure
Qualitative data scales
- Nominal (categories, no ordering) e.g male, female
- Ordinal (categories, ordered) e.g small, medium, large
Quantitative data scales
- Discrete (countable, integers)
- Continuous (measurable) e.g Age, temperature - can subdivide it
Paired or match variables
Two variables in the individuals of a population that are linked together in order to determine the correlation
Choice of statistical test from paired or matched observation
- Nominal variable - McNemar’s Test
- Ordinal (Ordered categories) - Wilcoxon
- Quantitative (Discrete or Non-Normal) - Wilcoxon
- Quantitative (Normal) - Paired t test
Parametric test
- Make assumptions about the parameters of the population distribution from which the sample is drawn
- Often that the population data are normally distributed
- Can only apply parametric tests (e.g T-test) if you have a sample big enough (in regards to population) to assume that the central limit theorem applies
Non parametric tests
- “distribution-free”
- Can be used for non-Normal variables
Reducing Type 1 and Type 2 errors
- Reducing the chances of a type I error increases the chances of a type II error and vice versa
- In science it is better to miss something than draw incorrect conclusions - reduce type I errors
- Bonferroni correction - Reduces instances of type I errors but increases type II errors
- Types II error reduction not as easy as Bonferroni:
- Increase sample size
- Change alternative value in the alternate hypothesis
ANOVA (analysis of variance)
- test looking at 3 or more groups
- reduces type I errors
- Used for comparing the means of three or more groups or variables
Monte Carlo simulation
- In uncertain scenario - allows for exploration of the problem/solution space
- One of the most popular techniques for calculating effect of unpredictable variables on a specific output variable
- Ideal for risk analysis
Factor analysis
- Large well-structured questionnaire
- Trying to address multiple things
- Many questions may investigate the same ‘factor’
- Method allows for grouping variables into set of underlying factors
- Confirmatory factor analysis - know what the factors are and have set them
- Exploratory Factor analysis - assume there are factors but not setting them
Cohort analysis
- Form of behavioural analytics
- Ideal for examining user behaviour
- Allow for exploration between cohorts
- Group of people who share common characteristics over a given time frame
Cluster Analysis
- Works by organising items into groups or clusters on how associated they are
- K-means clustering - n data points in k clusters
- Setting different number of clusters gives different results
- Works at a data-set level - every point is assessed relative to the others - data must be as complete as possible
- Intracluster distance - distance between clusters
- Intercluster distance - distance within clusters
Time series analysis
- Useful to see how variable changes over time
- Forecasting via trends
Sentiment Analysis
- Natural language processing technique to determine whether data is positive, negative or neutral
- Not terribly refined - can’t figure out sarcasm
Basic vs applied research
Research for curiosity vs research to answer a specific question