Research Methods - Techniques Of Data Handling And Analysis Flashcards

1
Q

Quantitative data

A

Quantitative Data = Quantitative data involves numbers and can be measured objectively. It is immediately quantifiable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Quantitative data includes

A

 The dependent variable in an experiment.
 Closed questions in questionnaires.
 Structured interviews
 A tally of how many times a behavioural category is seen in an
observation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Qualitative data

A

Qualitative data involves words and the data is based on the subjective interpretation of language. It is only quantifiable if the data is put into categories and the frequency is counted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Qualitative data includes

A

 Open questions in questionnaires.
 A transcript from an unstructured interview.
 Researchers describing what they see in an observation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Problems with qualitative data

A

Qualitative data is challenging to analyse because it relies on interpretation by the researcher, which could be inaccurate, subjective or even biased. Furthermore, qualitative data may not be easy to categorise/collate into a sensible number of answer types. The researcher could be left with lots of individual responses that cannot be summarised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Primary data

A

Primary Data = Primary data is collected directly by the researcher for the purpose of the investigation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Secondary data

A

Secondary data is information that was collected for a purpose other than the current use. The researcher could use data collected by them but for a different study, or collected by a different researcher. A researcher might make use of government statistics, such as mental health statistics collected by the NHS.

However, there is substantial variation in the quality and accuracy of secondary data and it can be hard for researchers to know how reliable secondary data is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Meta-analysis

A

A meta-analysis refers to the process of combining results from a number of studies on a particular topic (secondary data) to provide an overall view. Meta- analysis allows us to view data with much more confidence and results can be generalised across much larger populations. However, meta-analysis may be prone to publication bias; the researcher may choose to leave out studies with negative or non-significant results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Tables

A

When tables appear in the results section of a research report they are not raw scores but have been converted to descriptive statistics (measures of central tendency or measures of dispersion). There should be a paragraph beneath the table explaining the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Scattergraph

A

A scattergraph is a graphical display that shows the correlation or
relationship between two sets of data (or co-variables) by plotting dots
to represent each pair of scores. A scattergraph indicates the strength
and direction of the correlation between the co-variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Bar chart

A

A bar chart is used to show frequency data for discrete (separate) variables. The height of each bar represents the frequency of each item. In a bar chart a space is left between each bar to indicate the lack of continuity. The frequency of each category is plotted on the vertical y-axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Distributions

A

With most data sets the frequency of these measurements should reflect a bell shaped curve. This is called normal distribution which is symmetrical. Within a normal distribution most people are located in the middle area of the curve and very few people are at extreme ends. The mean, mode and median all occupy the same mid-point of the curve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Skews

A

A positive skew is where most of the data is concentrated to the left of the graph. In this case the mode remains at the highest point of the peak, the median comes next but the mean has been dragged across to the right. The opposite occurs in a negative skew.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Measures of central tendency + mean

A

Measures of central tendency inform us about central values for a set of data. They are ‘averages’ – ways of calculating a typical value for set of data. The average can be calculated in different ways, each one appropriate for a different situation.

The mean calculated by adding all the scores and dividing by the number of scores. The advantage of this method is that it is representative of all the data collected as it is calculated using all the individual values. The mean is the most sensitive measure of central tendency as it uses all the values in set of data. However, the disadvantage is that it can be distorted by a single extreme value in the set and the mean score may not be one of the actual scores in the set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The median

A

The median is calculated by arranging the scores in order then choosing the numerical midpoint. The advantage is that it is unaffected by extreme scores, unlike the mean. The disadvantage is that any outlier values/extreme values would not form part of the average measurement. It is less sensitive than the mean. It does not represent all the findings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The mode

A

The mode is the most frequent value in a set. The advantage is that it is unaffected by extreme scores. The disadvantage is that it tells us nothing about other scores in the data set.

17
Q

Measures of dispersion

A

A set of data can also be described in terms of how dispersed or spread out the data items are.

The range is calculated by taking the lowest score from the highest. An advantage of this is that it is quick and easy to calculate. A disadvantage is that it can be easily distorted by extreme values.

The standard deviation is the average amount that each score differs from the mean. An advantage is that it takes account of all the scores. A disadvantage is that it is more difficult to calculate than the range.

18
Q

What are inferential statistics designed to do

A

Inferential statistics have been designed to work out the probability (p) that a particular set of data has occurred by chance, and not because of the independent variable (IV). In other words these statistics tell us the chance that our sample of women are better than men at map reading because of luck rather than because women are genuinely better than men at map reading.

19
Q

Accepted level of probability that a data set has occurred

A

The accepted level of probability that a data set has occurred due to chance in Psychology is p<0.05 (less than 5%). This is the level at which the researcher decides to accept the alternative hypothesis.

20
Q

When to use a sign test

A

The sign test can only be used when there is one group of participants (e.g. a repeated measures design) and when the data is numerical (quantitative).

21
Q

How to carry out a sign test

A

Step 1: State the hypothesis – This is our hypothesis: ‘people are happier after going on holiday than they were beforehand’. This is a directional hypothesis and therefore requires a one-tailed test. If the hypothesis was a non-directional hypothesis then a two-tailed test would be used.

Step 2: Record the data and work out the sign - Record the difference between each pair of data (subtract the ‘happiness before’ score from the ‘happiness after’ score because we predict they will be happier afterwards. Next, record a (+) for happier after the holiday and a (–) for happier before the holiday.

Step 3: Find calculated value – S is the symbol for the statistic we are calculating. It is calculated by adding up the plus signs and the minus signs and selecting the SMALLER value. In this case there are 10 pluses and 3 minuses and one zero. Therefore the less frequent sign is minus so S=3. This is called the calculated value because we calculated it.

Step 4: Find the critical value – N is the total number of scores (ignoring any zero values). In our case N=13. The hypothesis is a directional hypothesis and therefore a one-tailed test is used. Now we use the table of critical values (below) and locate the 0.05 column for a one-tailed test and the row that begins with our N value (13). For a one- tailed test at 0.05 the critical value is S=3. The calculated value of S must be EQUAL TO or LESS THAN the critical value for significance to be shown. Our calculated value is equal to the critical value so is significant.

If the hypothesis is a directional hypothesis we also have to check that the results are in the expected direction. In this case we expect people to be happier afterwards and should therefore have more pluses than minuses. This was the case and therefore we can accept the alternative hypothesis and reject the null hypothesi

22
Q

Table to decide what statistical test to use

A
23
Q

Parametric vs non parametric tests

A
24
Q

Nominal data

A

Nominal data can be referred to as categorical data. For example, if a researcher was interested to know if more students doing A‐level psychology went to a school or a college, the data would be categorised as either ‘school’ or ‘college’: two distinct categories. If the data is nominal, then each participant will only appear in one category. This is called discrete data.

25
Q

Ordinal data

A

Ordinal data - data is ordinal if it is ordered in some way and the intervals between the data are not equal. Typically, this is used to simply rank data where the values assigned have no meaning beyond the purpose of stating where one score appeared in relation to others. For example, if people were asked to rate their preference of local restaurants, with 1 being their least favourite and 10 being their favourite, a researcher would be able to generate a list of restaurants from this data based upon the average ratings for each.

26
Q

Interval data

A

Interval data is like ordinal data in that it also refers to data that is ordered in some way. However, with interval data we are confident that the intervals between each value are equal in measurement. This type of data is much more objective and scientific in nature as a result. Examples of interval data include temperature and time. The difference between 3 and 4 degrees Celsius is the same as the difference between 35 and 36 degrees Celsius. Similarly, heart rate, blood pressure, ruler measures in m, cm or mm would be classed as interval level data.

27
Q

Evaluations of levels of measurement

A
28
Q

Type 1 errors

A

A Type 1 error would occur where we might reject the null hypothesis and accept the experimental/alternative hypothesis instead. However, the results for the study are really due to chance and are not statistically significant. So, we have made a mistake! We should have accepted the null hypothesis instead!Type 1 error is also known as a false positive

29
Q

Type II error

A

A Type II error would occur where a real difference in the data is overlooked as it is wrongly accepted as being not significant, accepting the null hypothesis in error (a false negative).

30
Q

Level of statistical significance defined

A

“The level at which the decision is made to reject the null hypothesis in favour of the experimental hypothesis. It states how sure we can be that the IV is having an effect on the DV and this is not due to chance.”

31
Q

Calculated value symbol for each test

A
32
Q

How to calculate Mann Whitney U test *

A
33
Q

How to calculate wilcoxon *

A
34
Q

How to calculate Chi squared test *

A
35
Q

How to calculate spearman’s rho *

A
36
Q

Related t, unrelated t, Pearson’s r how to know when to use which one

A
37
Q

Related t test

A

• This is used when we wish to test a difference
• The design is repeated measures or matched pairs
• The data should be interval
• Related t-test is a parametric test

38
Q

Unrelated t-test

A

• This test is used when we are looking for a test of difference
• This statistical test is used when we have an independent group design.
• This test has level of measurement which is interval.
• T unrelated t-test is a parametric test

39
Q

Pearson’s r

A

• This test is used for investigating correlations or relationships between variables
• The level of measurement for the data is interval
• Pearson’s r is a parametric test