begriffe Flashcards

1
Q

validity

A

extent to which a concept/measurement is well-foundend and likely corresponds accurately to the real world

internal vs external validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

internal validity

A

the obtained effect of x on y for your sample is the correct effect for the sample

-> generalization of causal findings to all cases WITHIN the sample

how to obtain:
-empirical model is correctly specified, estimators are unbiased
-> changes in the dependent variable are attributed to the independent variable (and no other factors ->challenge to eliminate that chance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

external validity

A

obtained effect of x on y in the sample is the correct effeft of x on y in the population P

-> generalization of causal findings to other cases not included in the sample -> the overall population

how to obtain:
-enough cases
-sample represents the population in all relevant characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

why is validity important

A

-theory and findings need to show a causal effect for the research to be relevant
-stakeholders need to know whether it also holds for other cases
-in practice: experiments usually of low external but high internal validity or neither perfect internal nor external validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

validity vs reliablility

A

reliability is the degree of precision with wich a specific aspect is measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

advantages of scientific observation

A

systematic approach of observing and generating information
-objectivity as oppose to selective set ob observations
-avoidance of “filling in” information
-verifiability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

population

A

all observational unit to whom the theory is assumed to apply

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

sample

A

a subset of the theoretically-defined population for which data is assessed
for reasons of validity, we want this subset to be representative of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

descriptive statistics

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

inferential statistics

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is data

A

quantified information
information for one single case: date point

manifest variables: directly observable variables (zb body height)
latent variables: abstract concepts only observable through manifest indicators (zb democracy etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

data types by source

A

source:
observable world -> observational data
field or lab experiment -> experimental data
an algorithm -> simulated data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

data processing

A

-> to eliminate sources of error

processing includes:
-reduction of measurement error
-addressing of inter-coder reliablitity
-elimination of missing data points
-identification of outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how to measure data

A

measurements require
-measurement scale
-measurement unit
-measurement instrument

also includes
-counts
-quantifiactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

types of variables

A

can be descriped by three elements: instrument, measurement unit, scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

variables by scale
categorical variables

A

how are observations arranged?

nominal variables
-numerical values are used as a label or type of attributes
-no intrinsic order between categories
-zb gender, party affiliation: spö=1, övp=2

ordinal variables:
-variables of two ore more catagories which can be ranked
-value and gap is not interpretable
zb smart (no twice as smart)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

variables by scale
metric variables

A

interval variables
-variables have a zero value (usually without a clear meaning)
-distance between attributes has the same meaning

ratio variables
-zero means thet there is nothing of this variable left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

verwende datenset thedata.dta

A

use thedata.dta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

delete all variables and data

A

clear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

zusammenfassen eines datensets

A

describe, short
describe, simple

summarize
sum, detail

tabulate

list

codebook

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

excel datenset importieren

A

import excel “…”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

var1 und var2 entfernen

A

drop var1 var2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

alle außer var3 und var4 entfernen

A

keep var 3 var4

24
Q

distribution table of a variable

A

tabulate …

missing values are not depicted, only if:
tabulate .., missing

25
Q

create a variable

A

generate

26
Q

give variable another name

A

rename

27
Q

change tha value of a variable for another value

A

replace

28
Q

add a description to a variable

A

label variable

29
Q

add a label to a variable value

A

2 steps needed:
label define
label values

30
Q

change order of variables in dataset

A

order

31
Q

measuring unobservables

A

conceptualization and operationalization are needed

-> theoretical definitions
->clarify how concept is measured by sepcifiying indicators and how informaiton is gathered -> systematized

good operationalizaiton is linked to your theory

zb concept: study success

-> attributes : academic achievement // acquired abilites

-> components: received prives, amount of prize money // ability to solve problems etc

32
Q

issues of conceptualization

A

problem of conflation
-sub components should be conceptually in line with attributes at the corresponding upper level -> sub-components should not relate to conceptuall different attributes

problem of redundancy:
-components at the same level should be mutually exclusive

33
Q

minimalist definition of attributes

A

+ availability of data may be enhanced
+ no redundancy with other attributes

-every case is an instance, no variation
-meausre might not reflect the concept well (invalidity)
-measure may only be applicable for one situation

34
Q

maximalist definition of attributes

A

= including too many (irrelevant) attributes

potential drawacks of overburdening:
-lower usefulness as concept has no empirical referents
-tautological and of little analytical use if main dependet variable is already included as an attribute

35
Q

Median

A

50% -> Wert der Mitte,
value located directly in the center of collected data

herausfinden: sum var1, detail -> wert bei 50%

36
Q

not normally distributed data

A

sum var1, detail

skewness: positive value indivates that a variable is skweded to the right (outliers)

-> if highly skewed to the right, median might be more representative that the mean, because mean is affected by outliers

37
Q

boxplot interpretieren

A

well suited for ordinal and metric data

whisker from minimum value of the sample
-> lower quartile (one quarter of the sample lies here)

then box with median in the middle

whisker showing the upper quartile

whiskers are without potential outliers

38
Q

Modus

A

Modalwert = most common value of a variable

39
Q

mean

A

arithemtishces mittel
average value of a variable

40
Q

bivariate descriptive statistics

A

shows the relationship between two variables

options:
-crosstables
-comparioson of key measures zb mean
-graphical comparison
-correlational measures

41
Q

correlation vs causality

A

correlation:
var A and var B are correlated if higher/lower values of variable A coincide with higher/lower values in variable B
-> you dont know whter varA influences var B or vice versa
negative correlation: if values of var A are lower, values of var B are lower as well

positive correlation: if values of var A are higher, values of var B are higher as well

causality: direct relationship between var A and var B
-> a change in var A leads to a change in var B
–> more difficult to determine, needs research design

42
Q

how can data be visualized

A

amounts
zb bar charts, dots, grouped bars

distributions
zb histogram, boxplots

proportions
zb pie chart, bars

x-y relationships
zb scatterplot

uncertainty
zb error bars,

geospatial data
map

43
Q

sort bars in stata descending

A

graph hbar, over (var1, sort (1) descending)

44
Q

adjust bandwith of histogram

A

hist var1, width (5)
vs hist war, width (10) -> more values in one bar of the histogram

45
Q

increase x and y title size

A

xtitle (, size (large))

ytitle (, size large))

46
Q

commands for tables

A

tabulate

fre

47
Q

color schemes

A

…., scheme (schemename)

assign individual colors

bar(1, color (“black”))

48
Q

different types of inferences

A

descriptive interference:
-historical accurafy of scientific information
-simply observing sample data

statistical inferences
-use sample properties to infer properties of a populations
-unterstand development ofer time or relationship between variables
-focus on understand how uncertain findings are -> t-tests

causal inferences:
-infer the existence of a causal effect from data analysis

49
Q

hypothesis testing in stata

A

goal: infer from the sample to the population

problem: population is usually unknown and only one (not infinetly many) samples are available

need: an estimation of the uncertainty resulting from the use of random sampling

solution: mean/standard deviation or proportion value in the sample as an estimate

50
Q

stratum

A

a subset of elements from the population that share a characteristic (usually sociodemographic zb age, gender)

51
Q

sampling frame

A

a list of elements in a population that can be identified

52
Q

convenience sample

A

-use of information from participants who are convenient to access
-sampling method does not need to select participants based on any set of criteria
-only use this method if representativeness is not of importance for research

53
Q

quota sample

A

is primarily used when information is to be collected on a specific, definable target opoulation
-if it worked well, quota sample privides a structurally identical representation of the population
-volunteers could still bias the picture

54
Q

stratified sample

A

stratified sampling involves random selection within predefined groups (e.g. gender, age)
-> people within a stratum are randomly selected
-strata is supposed to ensure that the make-up of the population is adequately mirrored

55
Q

simple random sample

A

selection process takes place randomly
-each participant has a chance of being selected

56
Q

survey weights

A

-when sample deviates from the actual population
-suvery weights are estimated variables -> even out the differences between sample and population