Measuring The Mind Flashcards

1
Q

Quantitative psychology

A
  • Contrast with qualitative psychology. Records rather than measures.
  • In quantitative psychology we try to quantify behaviour and experiences.
    How do we quantify behaviour and experiences?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Subjective measures

A
  • Likert scale
    • Labelled levels
    • Odd number
    • Bimodal
    • Anchored scales (Numeric scales)
      • Only endpoints are labelled
      • Limited number of responses
        How much are you enjoying this lecture?
        1 2 3 4 5 6 7 8 9
        Not at all A lot
    • Visual Analogue Scale (VAS)
      • Only endpoints are labelled

Distance to anchors is measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Getting objective

A
  • Does a behaviour happen or not?
    • Is unused paper recycled or discarded?

Accuracy point of subjective equivalence (PSE)
accuracy of detection can change across a range of stimuli values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

2-alternative-forced-choice (2afc)

A
  • Accuracy of 2afc decision reveals.
    Chance level is at 50% performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Accuaracy

A

discrimination and criterion

different criteria are being employed in the two situations but your ability to discriminate remains the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Every decision can be seen in
Signal Detection Theory

A
  • Hearing a tone in a noisy background.
  • Detecting a bomb in luggage x-ray.
  • Deciding whether to release a missile.
  • Deciding whether a PCR Covid test is really an infection or not.
    Deciding whether to approach an attractive person (reading the signals).

look at diagrams in notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

SDT Outcome

A
  • From proportions of Hits and False alarms SDT will give you Discrimination (d’) and Criterion (β).
  • Discrimination is how easy the task is (how overlapping the curves are).
  • Criterion is the level that the participant is using to make the decision (conservative or liberal).
  • You can do the calculation online at:
    • https://elvers.us/perception/sdtCalculator/

It is all about timing…
* Some tasks take longer than others
* 5 + 5 = ?
* 27 x 13 = ?
* One word to complete these three words
* ________ jack, _________board, _________ hole.
Longer tasks take more stages of processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Measuring the speed of mental processes

A
  • Herman von Helmholtz
  • Frog legs experiment.
    Neural impulses travel at 100km/hr
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

mental processes from reaction times

A

Donders 1968 method of subtraction

IF task B includes all elements of task A plus some additional element the time required to do that additional element with be the difference between the reaction times of the two tasks

light —> light detection —> response —> simple RT

light —> light detection —-> discrimination (red/green) —> response —> Goho-go RT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sternberg’s 1969 addictive factors logic

A

if two different changes to a task effect the task in an addictive way, then the two changes affect different stages of processing. if they are interactive, then they affect the same task.

graphs in notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

visual search and reaction times

A

reaction times used to identify visual feature pop outs. features are unaffected by search size.

graphs in notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

combining accuracy and reaction data. drift diffusion model

A

graph in notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Insight into the mind from other measures

A
  • Heart rate – Linked to arousal
  • Heart rate variability – Linked to stress
  • Blood pressure – Linked to stress and anxiety
  • Galvanic skin response – Linked to arousal and recognition
    ○ Fluid increases in sweat ducts increasing skin conductance
  • Electromyography (EMG) – Directly records muscle activity
    ○ Electrodes placed over the muscle locations
  • Hormone levels
    ○ Cortisol, Testosterone.
    ○ Measures taken from saliva or urine
  • Eye tracking
    ○ Able to identify regions of interest and measure time spent looking at those regions and number of saccades to those regions.
  • Pupil dilation
    Used to measure arousal.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Other methods to investigate brain function

A

-fMRI
-MEG/EEG

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

fMRI

A
  • Functional magnetic resonance
    imaging
    • Measures blood oxygen levels in the brain as an indicator of regional activity (BOLD)
    • High spatial resolution (voxels of less than 1mm)
    • Records in 3D (ie, internal brain activity)
  • Disadvantages
    • Indirect measure of brain activity
      Slow responses (2-5s) makes temporal resolution poor
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

MEG/EEG

A
  • Magnetoencephalography
  • Electroencephalography
    • Measurement of electrical action potentials at the scalp to infer synaptic activity.
    • High temporal resolution (better than 1ms)
  • Disadvantages
    Poorer spatial localisation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Event Related Potentials (ERPs)

A
  • Take a particular event and measure the pattern of potentials following it.
  • Repeat many times.
    Find the average pattern for a particular type of event
  • These are many of the tools that we have to try to uncover the inner workings of the mind.
  • They provide us with the data that help build models of how stimuli produce behaviours.
  • There is one last way to find out what is in someone’s mind.
  • Just ask them.
    That is what happens in qualitative research.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

summary

A
  • To understand the mind, we need to code observable responses.
  • These responses come in many formats
    • Subjective
    • Objective binary responses
    • Reaction times
    • Physiological responses
    • Neuropsychological responses
      Which one is appropriate depends on the question that we are asking
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

terms

A
  • Likert scale, Anchored scale, VAS
  • PSE, 2afc, JND
  • SDT, Discrimination (d’), and Criterion (β)
  • Reaction times/Response times
    • Donders, Sternberg, Visual search
  • Physiological responses
    • GSR, EMG, Heart Rate…
  • fMRI
    MEG/EEG
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

who to measure - sampling procedures

A

those ptps constitute a sample which may be considered a subset of some general group called the population. the eventual goal of research in psychology is to draw conclusions that apply to some general population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

probability sampling

A

each member of the population has a definable probability of being selected for a sample. its commonly used in survey research when researchers have better access to their target population of interest from which they want to take a sample. the sample needs to reflect the attributes of the target population as a whole. when this happens the sample is representative. if it doesn’t happen the simple is biased. the sample needs to be selected using a clearly define sampling procedure.

22
Q

random sampling

A

the simplest form of probability sampling is to take a simple random sample. this means each member of the population has an equal chance of being selected as a member of the sample. the procedure usually involves software that uses a random number generator or table. simple random sampling is often an effective, practical way to create a representative sample.
It is sometimes the method of choice for ethical reasons as well. In situations in which only a small group can receive some benefit or must incur some cost, and there is no other reasonable basis for decision‐making, random sampling is the fairest method to use. There are two problems with simple random sampling. First, there may be systematic features of the population you might like to have reflected in your sample. Second, the procedure may not be practical if the population is extremely large.

23
Q

stratified sampling

A

In a stratified sample, the proportions of important subgroups in the population are represented precisely. In the previous example, with a goal of a sample of 100, 60 women would be randomly sampled from the list of female students, and 40 men would be randomly selected from the list of male students.
it doesn’t solve the problem of trying to sample from a large population when it is often impossible to acquire a complete list of individuals.

24
Q

cluster sampling

A

Cluster sampling, a procedure frequently used by national polling organizations, solves the problem. With this approach, the researcher randomly selects a cluster of people all having some feature in common. A campus survey at a large university might be done this way. If a researcher wanted a cross section of students and stratified sampling was not feasible, an alternative would be to get a list of required ‘core’ classes. Each class would be a cluster and would include students from a variety of majors. If 40 core classes were being offered, the researcher might randomly select 10 of them and then administer the survey to all students in each of the selected classes. If the selected clusters are too large, the researcher can sample a smaller cluster within the larger one.

25
Q

non probability sampling

A

a convenience sample - This is a group of individuals who meet the general requirements of the study and are recruited in a variety of nonrandom ways. Often they are from the “subject pool”—often psychology students being asked to participate in research.
5. Sometimes a specific type of person is recruited for the study, a convenience sampling strategy called purposive sampling.
Two other forms of convenience sampling are quota sampling and snowball sampling. In quota sampling, the researcher attempts to accomplish the same goal as stratified sampling— representing subgroups proportionally—but does so in a nonrandom fashion
In snowball sampling, once a member of a particular group has been surveyed, the researcher asks that person to help recruit additional subjects through a network of friends. This sometimes occurs when a survey is designed to measure attitudes and beliefs of a relatively small group (e.g., triathlon runners) or a group that generally wishes to remain hidden (e.g., prostitutes). It is also easy to use snowball sampling if your study is conducted online and you provide participants with the opportunity to share the study with others via a weblink or use of social media. Researchers using quota or snowball sampling recognize their results will have a degree of bias, so they will be properly cautious in the conclusions they make from their sample to the population as a whole.

26
Q

what to measure - varieties of behaviour

A

the varieties of behaviour measured by research psychologists is virtually unlimited. what is measured ranges from overt behaviour to self-report to recordings of physiological activity. eg

1 - Elkins, Cromwell and Asarnow 1992investigated attention‐span limitations in patients diagnosed with schizophrenia. The behavior measured was whether or not the participants could accurately name target letters embedded in an array of distracting letters. Compared with control subjects (individuals who did not have schizophrenia), those with the disorder did poorly when asked to identify target letters. 2. Westman and Eden (1997) examined the effects of a vacation on perceived stress and degree of burnout for clerical workers in an electronics firm. On three occasions—before, during, and after a vacation—researchers measured (a) perceptions of job stress with eight items from a survey instrument called the Job Characteristics Questionnaire, and (b) job burnout with a 21‐item Burnout Index. Participants also completed a Vacation Satisfaction Scale. Initially, high stress and burnout scores dropped precipitously during the vacation, but the effect was short‐lived. By 3 weeks after the vacation, stress and burnout levels were back at the pre‐vacation level. 3. Diener, Fraser, Beaman, and Kelem (1976) observed the candy‐ and money‐taking behavior of children on Halloween night. The behavior observed (from behind a screen by an experimenter) was whether children took extra amounts of candy, and/or took money from a nearby bowl, when the woman answering the door briefly left the room. When given an opportunity to steal, the children were most likely to succumb to temptation when (a) they were in groups rather than alone, and (b) anonymous (i.e., not asked their name) rather than known. 4. Holmes, McGilley, and Houston (1984) compared people with Type A or Type B personalities on a digit span task (listen to a list of numbers, then repeat them accurately) that varied in difficulty. While performing the task, several physiological measures of arousal were taken, including systolic and diastolic blood pressure. Compared with more laid‐back Type B subjects, hard‐driving Type A subjects showed elevated blood pressure, especially when the task increased in difficulty.

27
Q

developing measures from constructs

A

researchers measure behaviour in many ways but how do they decide what to measure. they know what to measure because they know the literature in their area of expertise and so they know what measure are used by other investigators. they also develop ideas for new measures by modifying commonly used measures or perhaps by creatively seeing a mew use for an old measure finally they develop measures out of the process of refining the constructs of interest in the study in hopes of answering their empirical question.

28
Q

evaluating measures

A

determining if a measure is any good requires a discussion of two key factors - reliability and validity

29
Q

reliability

A

a measure of behaviour is said to be reliable if its results are repeatable when the behaviour are remeasured. eg reaction time. reliability is essential in any measure. without it there is no way to determine what a score on any one particular measure means.
a behavioural measures reliability is a direct function of the amount of measurement error present. if there is a great deal of error, reliability is low and vice versa. no behavioural measure is perfectly reliable so some degree of measurement error occurs with all measurement. every measure is a combination of a hypothetical true score plus some measurement error. ideally measurement error is low enough so the observed score is close to the true score.

there are ways of calculating reliability but this is seldom done in experimental research. confidence in the reliability of a measure develops over time a benefit of the replication.

Reliability is assessed more formally in research that evaluates the adequacy of any type of psychological test. These are instruments designed to measure such constructs as personality factors (e.g., extroversion), abilities (e.g., intelligence), and attitudes (e.g., political beliefs). They are usually paper‐and‐pencil tests in which a person responds to questions or statements.

30
Q

validity

A

a behavioural measure is said to be valid if it measures what it Is designed to measure.
the simplest level of validity is called content validity. this type of validity concerns whether or not the actual content of the items on a test makes sense in terms of the construct being measured. it comes into play at the start of the process of creating a test, because it concerns the precise wording of the test items.

content validity is sometimes confused with face validity, which is not actually a ‘valid’ form of validity at all (Anastasia and Urbina 1997). face validity concerns whether the measure seems valid to those who are taking it and it is important only in the sense that we want those taking our tests and filing out our surveys to treat the task seriously. a test can make sense to those taking it and still not be a valid test.

31
Q

criterion validity

A

a more critical test of validity. or whether the measure is related to some behavioural outcome or criterion that has been established by prior research. criterion validity is further subdivided into two additional forms of validity - predictive validity and con-current validity.

32
Q

predictive validity

A

is whether the measure can accurately forecast some future behaviour

33
Q

concurrent validity

A

is whether the measure is meaningfully related to some other measure of behaviour.

34
Q

construct validity

A

concerns whether a test adequately measures some construct and it connects directly with the operational definition. a construct is a hypothetical factor developed as part of a theory to help explain a phenomenon or created as a shorthand term for a cluster of related behaviours. constructs are never observed directly so we develop operational definitions for them as a way of investigating them empirically and then develop measures for them.
construct validity relates to whether a particular measurement truly measures the construct as a whole. confidence in construct validity accumulates gradually and inductively as research produces supportive results.

research establishing criterion validity helps establish construct validity but construct validity research includes two additional procedures - convergent and discriminant validity. scores on a test measuring some construct should relate to scores on other tests that theoretically related to the construct (convergent validity) but not to scores on other tests that are theoretically unrelated to the construct (discriminant validity).

35
Q

reliability and validity

A

for a measure to be of value in psychological research it must be sufficiently reliable and valid. reliability is important because it enables one to have confidence that the measure taken is close to the true measure. validity is important because it tells you if the measure actually measures what it is supposed to measure and not something else. validity assumes reliabilty but the reverse is not true. measures can be reliable but not valid, valid measures must be reliable.

the issues of reliability and validity have ethical implications especially when measures are used to make decisions affecting the lives of others.

strictly in the context of measurement, validity concerns whether the tool being used measures what it is supposed to measure. over the entire research project, valid concerns whether the study has been properly conducted and whether the hypothesis in question has been properly tested.

36
Q

scales of measurement

A

there are four different measurement scales

37
Q

nominal scales

A

sometimes the number we assign to events serves only to classify them into one group or another. when this happens we are using what is called a nominal scale of measurement. studies using these scales typically assign people to names of categories and count the number of people falling into each category.

38
Q

ordinal scales

A

sets of rankings, showing the relative standing of objects or individuals. often rank order data are derived from other data. has no fixed gap

39
Q

interval scales

A

most research in psychology uses interval or ratio scales of measurement. interval scales extend the idea of rank order to include the concept of equal intervals between the ordered events. research using psychological tests of personality, attitude and ability are the most common examples of studies typically considered to involve interval scales.
It is important to note that with interval scales, a score of zero is simply another point on the scale—it does not mean the absence of the quantity being measured. The standard example is temperature. Zero degrees Celsius or Fahrenheit does not mean an absence of heat; it is simply a point on the scale that means, “put your sweater on.” Likewise, if a test of anxiety has scores ranging from 0 to 20, a score of 0 is simply the lowest point on the scale and does not mean the complete absence of anxiety, just a very low score.

40
Q

ratio scales

A

With a ratio scale, the concepts of order and equal interval are carried over from ordinal and interval scales, but, in addition, the ratio scale has a true zero point—that is, for ratio scores, a score of zero means the complete absence of the attribute being measured. For instance, an error score of zero attained by a rat running a maze means the absence of wrong turns. Ratio scales are typically found in studies using physical measures such as height, weight, and time. The research examples described earlier on habituation (measured by time spent looking at arrays) and reaction time both illustrate the use of a ratio scale (Kim & Spelke, 1992; Shepard & Metzler, 1971). Ratio scales are also used when one wants to measure a countable quantity of some sort – such as the number of errors or the number of items correctly recalled.

41
Q

statistical analysis - descriptive and inferential statistics

A

descriptive statistics summarise the data collected from the sample of ptps in your study
inferential statistics allow you to draw conclusions about your data that can be applied to the wider application

42
Q

descriptive statistics

A

includes measures of central tendency, variability and association, presented both numerically and visually (graphs).

central tendency measures - mean, median, mode

scores that are far removed from other scores in a data set are known as outliers

measures of variability - range, interquartile range, variance

variance - which represents how distributed the scores are, relative to the mean.

the standard deviation for a set of sample scores is an estimate of the average amount by which scores in the sample deviation from the mean score

normal distribution = a frequency distribution - it is a hypothetical distribution of what all the scores in the population would be if everyone was tested on a particular measure.

43
Q

inferential statistics

A

they allow the researcher to make inferences about the population based on the sample data and to do so with a certain degree of certainty or confidence.

44
Q

null hypothesis significance testing

A

We wish to discover relationships between psychological constructs and ultimately to determine the causes of behavior. Because the research enterprise tends to test samples selected from a target population, we can never be 100% certain about conclusions drawn from research testing samples. but we can estimate the likelihood, or probability, that our results are not due to chance. Thus, we use statistical determinism to establish laws about behavior and make predications with probabilities greater than chance. Null hypothesis significance testing (NHST) yields these probabilities. The first step in significance testing is to assume there is no difference in performance between the conditions that you are studying, in this case between immediate and delayed rewards. This assumption is called the null hypothesis (null = nothing), symbolized H0 . The research hypothesis, the outcome you as a researcher are hoping to find (fewer learning trials for rats receiving immediate reward), is called the alternative hypothesis (or sometimes research hypothesis) or H1 . The logic of NHST is that we only can test the null hypothesis. Why? Because the only “given” we know is that all participants in the study begin the study being relatively equal (assuming good sampling procedures), regardless of what they later do as participants in the study. It may seem strange that you are testing the null hypothesis, when as a researcher you have developed a research hypothesis you want to test. However, statistical determinism begins with what is known or given (you don’t know what will happen with your participants as a result of doing your study yet!), and therefore the only given is the null hypothesis. Thus, in your study, you test the null hypothesis, and you hope to be able to reject H0 supporting (but not proving) H1 with a degree of confidence, thereby , the hypothesis closer to your heart.
An inferential analysis with NHST can have only two outcomes: reject H0 or fail to reject H0 If you reject H0 . , you are saying that you reject the idea that there is no difference (or no relationship) between conditions. Yes, in grammatical terms, this is a double‐negative—which translates into “there is a statistically significant difference between conditions.”
If you fail to reject H0 , you are saying that you fail to reject that there is no difference between conditions. Translated, this means “there is no statistically significant difference between conditions,” and that the results you discovered were due to chance. The important point here is to think about what numbers are being compared when determining with what probability there is a difference between conditions.
The researcher’s hypothesis (H1 ) is never proven true in an absolute sense,
Guilt is said to be proven only beyond a reasonable doubt. Thus, H0 other, more stringent, levels as well (e.g., α = .01). If H0 is rejected when alpha equals .05, it means you believe the probability is very low (5 out of 100) that your research outcome is the result of chance factors. Another way to think of the alpha level is to think that you want to be pretty confident that your results are NOT due to chance if your study is replicated, say, 100 times. With an alpha of .05, you are hoping that only 5% of the time (or less) your results are due to chance. You want to be at least 95% confident that your results are NOT due to chance, or that if your study replicated 100 times, you should get the same results 95% of the time (or 95 times of 100). When running various statistical tests, you can calculate (or let software calculate) the can only be rejected with some degree of confidence, which is set by what is called the (no difference) is really true. By convention, alpha is set at .05 (α = .05), but it can be set at alpha (α) level. Technically, alpha refers to the probability of obtaining your particular results if H0 (no difference) is really true. by convention alpha is set at .05 but it can be set at other more stringent levels as well eg .01. If H0 is rejected when alpha equals .05, it means you believe the probability is very low (5 out of 100) that your research outcome is the result of chance factors. Another way to think of the alpha level is to think that you want to be pretty confident that your results are NOT due to chance if your study is replicated, say, 100 times. With an alpha of .05, you are hoping that only 5% of the time (or less) your results are due to chance. You want to be at least 95% confident that your results are NOT due to chance, or that if your study replicated 100 times, you should get the same results 95% of the time (or 95 times of 100). When running various statistical tests, you can calculate (or let software calculate) the probability based on your sample that your results are due to chance. This calculated probability is called a p‐value. You can compare the calculated p‐value to the alpha to determine the probability that your results are due to chance is less than 5% (α = .05). If your p is less than α, then you can reject null hypothesis (which predicts no difference – or your results are due to chance). Then, you conclude there is a statistically significant difference between your conditions. If your result is not due to chance, then it must be due to something else—namely (you hope), the phenomenon you are studying, immediacy of reinforcement in this case.

45
Q

type I error

A

rejecting the null hypothesis when its true. chance of this is .05 = 5%

46
Q

type II error

A

happens when you fail to reject the null hypothesis but your wrong - you don’t find a significant effect but are in error. they sometimes occur when the measurement used is not reliable or aren’t sensitive enough to detect true differences between groups or you have a small sample size.

47
Q

interpreting failures to reject H0

A

there might indeed be no difference to be found or it could be there is one but you have failed to find it in your study = type II error. however consistent failures to find differences is important.
Another implication of the failure‐to‐reject issue is that often (but not always) nonsignificant findings do not get published. The notion that only “statistically significant” results get published is referred to as a publication bias, and is a topic of much current discussion in psychological science.
Studies finding no differences are less likely to be published, and wind up stored away in someone’s files—a phenomenon called the file drawer effect (Rosenthal, 1979). A big problem occurs, however, if only a few published studies show an effect, no published studies show the opposite or no effect, but many studies languish in file drawers because no differences were found and the studies couldn’t get published.
There are also ethical issues related to the failure‐to‐reject issue. Social psychologist Brain Nosek suggests that questionable research practices, like those described in Chapter 3 may occur in part because nonsignificant results are rarely published in psychological journals (Carpenter, 2012). As noted in Chapter 3, there is a strong movement in psychological science to replicate prior results and also to share failures to replicate. Online repositories like PsychFileDrawer.org allow users to attempt to replicate listed studies, and the site reports successful and unsuccessful replication attempts. Both exact and conceptual replications are also distinguished on this website. As a new scientist to psychological science, you may learn quite a lot from being a part of a replication project.

48
Q

beyond null hypothesis significance testing

A

A major point of contention has been the all‐or‐none character of null hypothesis decision making, with the alpha level of .05 taking on seemingly magical properties (Cohen, 1994).
Defenders of NHST argue that providing a strong test before drawing a conclusion is not necessarily a bad thing; they also ask, if not .05 as a cutoff, then what should it be? .10? .15? A beneficial effect of the argument has been an elaboration of the kinds of statistical analyses being used by research psychologists today. Based on recommendations from a special task force created by the APA (Wilkinson et al., 1999), researchers have begun to include several new features in their descriptions of statistical analyses, including calculations of effect sizes, confidence intervals, and estimates of power. In most journals that publish research articles that include NHST, the authors are often also required to include reports of effect sizes and confidence intervals.

49
Q

effect size

A

The result of a null hypothesis significance test could be that a statistically significant difference between two groups or among multiple groups exists. But this outcome does not inform the researcher about the size of the difference(s). An effect size index is designed to do just that. Effect size provides an estimate of the magnitude of the difference among sets of scores, while taking into account the amount of variability in the scores. Different types of effect size calculations are used for different kinds of research designs. All yield a statistic that enables the researcher to decide if the study produced a small, medium, or large effect. One common measure of effect size between two conditions is Cohen’s d.
One major advantage of calculating effect sizes is that it enables researchers to arrive at a common metric for evaluating a diverse array of experiments.

50
Q

meta - analysis

A

A meta‐analysis uses effect‐size analyses to combine the results from several (often, many) experiments that use the same variables, even though these variables are likely to have different operational definitions.
The outcome of a meta‐analysis relates to the concept of converging operations. Meta‐analyses are also related to replication of results, and specifically the generality of a phenomenon based on mostly conceptual replications. However, meta‐analysis can be used with direct replications and is used in many of the on‐going replication projects occurring in the science of psychology.

51
Q

confidence intervals

A

Calculating confidence intervals for specific means adds to the quality of the information provided in a description of results. In particular, a confidence interval is an inferential statistic that enables the researcher to draw a conclusion about the population as a whole based on the sample data.
A confidence interval is a range of values expected to include a population value with a certain Hence, there is a very degree of confidence. What a confidence interval tells us is that, based on the data for a sample, we can be 95% confident the calculated interval captures the population mean
not—the combination of effect size and confidence intervals can better explain your results. Effect size allows you to answer the question about how big the effect is, and confidence intervals allow you to answer whether the population mean would be included in the range of scores around the mean, taking into account error variance. While NHST may show a “significant effect,” say in the maze example above, with a closer examination of the effect size and confidence interval, you can better explain the result, in that the effect is large (Cohens d = 2.20) and there is a good chance the population mean for each condition would not be overlapping across conditions. Researchers are often encouraged to report both effect sizes and confidence intervals for various journals, including all of the journals published by the American Psychological Association, which includes some the top journals in our field.

52
Q

power

A

when one hopes to be able to reject the null hypothesis. a test is said to have high power if it results in a high probability that a real difference will be found in a particular study. as power increases the chance of type II error decreases and vice versa.
The probability of a Type II error occurring is sometimes referred to as β (beta). Power, then, is 1−β. Power is affected by the alpha level (e.g., α = .05), by the effect size, and, especially, by the size of the sample. This latter attribute is directly under the experimenter’s control, and researchers sometimes perform a power analysis at the outset of a study to help them choose the best sample size for their study. G*Power is a free online software tool used for calculating power, or you can consult a statistics text for more details on completing a power analysis.
increasing the sample size in a study that is well designed is usually the best way to increase power. However, the other side of the power coin is that a huge sample size might produce a result that is statistically significant, but meaningless in a practical sense—that is, a small effect size between groups might have little importance in a study with huge numbers of participants.