data exploration Flashcards

1
Q

data does not

A
  • speak for itself
  • it can be biased and is not objective (based on how selected)
  • the people behind it interprets the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

answers/results depend on..

A

question to solve and perspective

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

types of data sets

A
  • cross-sectional
  • time-series
  • panel
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

cross-sectional

A
  • many subjects/variables, one point in time
  • eg sales, expenses, profit
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

time-series

A
  • one subject/variable, many points in time
  • eg sales over time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

panel

A
  • many subjects/variables, many points in time
  • eg sales, expenses, profit over time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

dimensions of data quality

A
  • completeness
  • consistency
  • conformity
  • accuracy
  • integrity
  • timeliness
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

completeness

A

comprehensive and meets expectations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

consistency

A

across all systems/sourced from different places reflects the same information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

conformity

A

follows set of standard data definitions like data type, size and format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

accuracy

A

correctly reflects the real world object OR an event being described

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

integrity

A

all in a database can be traced and connected to other data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

timeliness

A

information is available when it is expected and needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

first two steps of data cleansing/processing

A
  • sourcing raw data
  • technically correct data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

sourcing raw data

A

What do we want and need to achieve?
What data will support this outcome?
How can we source it and ensure it is of a high quality?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

technically correct data

A
  • when can be directly recognised as belonging to a certain variable
  • is stored in a data type that represents the value domain of the real-world variable
17
Q

data issues

A
  • formatting/data type
  • missing values
  • outliers
18
Q

formatting/data type

A
  • sex; Male, M, Boy
  • month; January, 1-Jan, 1
19
Q

missing values - listwise deletion

A

remove records with missing values in any variable

20
Q

missing values - mode/median/mean imputation

A
  • mean for continuous variables
  • median for skewed continuous variables
  • mode for categorical variables
21
Q

missing values - model imputation

A
  • interpolate/extrapolate
  • use regression model to predict missing value
22
Q

outliers - drop outlier record

A

completely remove record to avoid severe skewness

23
Q

outliers - winsorisation

A
  • cap your outliers data
  • limit extreme values in statistical data to reduce effect of possibly spurious (false) outliers
24
Q

outliers - imputation

A
  • assign a new value
  • mean or regression
25
Q

data privacy

A

claim of individuals, groups, and institutions to determine for themselves, when, how, and to what extent information about them is communicated to others

26
Q

data privacy principles

A
  • notice
  • choice and consent
  • use and retention
  • access
  • protection
  • enforcement and redress
27
Q

notice

A

inform users about privacy policy/protection procedures

28
Q

choice and consent

A

consent from individuals about collection, use, disclosure, and retention of information

29
Q

use and retention

A

data is retained/protected according to law or business practices required

30
Q

access

A

provide access to individuals to review, update, and modify data about personal information

31
Q

protection

A

data is used only for purpose stated

32
Q

enforcement and redress

A

provide channels for individuals to report, provide feedback, or complain

33
Q

ethics of data security

A
  • managing quality personnel to address ethical issues
  • perceived potential conflict of interest also exists relative to ethical behaviours and technical knowledge
34
Q

Australian Security Principles protects against

A
  • misuse
  • interference
  • loss
  • unauthorised access, modification, disclosure