Exploring Data - Topic 2: Data and Graphical Summaries Flashcards
What is Data?
Data is info about the set of subjects being studied (like road fatalities). Most commonly, data refers to the sample, not the population (unless it is a census)
What are some examples of the different types / formats of data?
Survey data
Spreadsheet type data
MRI image data
What is the Initial Data Analysis (IDA)?
It is a first general look at the data, without formally answering the research questions.
The purposes of IDA are to ensure that later statistical analysis can be performed efficiently and to minimise the risk of incorrect or misleading results
WHat could an IDA assist with?
It could assist with:
IDA helping you to see whether data can answer your research questions
IDA posing other research questions
IDA identifying the data’s main qualities and suggesting the population from which a sample derives
What steps does the IDA involve?
Commonly involves:
Data background: checking the quality and integrity of the data
Data structure: what info has been collected?
Data wrangling: scraping, cleaning, tidying, reshaping, splitting, combining
Data summaries: graphical and numerical
NOTE: EVERY STEP INVOLVED IN THE IDA HAS TO BE DOCUMENTED AS IT ALLOWS FOR THE DATA TO BE REPRODUCED
What is a variable?
A variable measures or describes some attribute of the subject. Data with ‘p’ variables is said two have dimension p
What is it called when there is only 1 variable involved?
Univariate
What is it called when there are 2 variables involved?
Bivariate
What is it called when there are more than 2 variables involved
Multivariate
Would an anonymous identifier such as CRASH ID count as a variable?
No it won’t because it doesn’t add any other useful info to the data only allows for recognition
Is recording raw quantitative or qualitative data preferrable?
Raw quantitative data if possible, because it can easily be summarised into qualitative data, however it is hard to transfer qualitative data into quantitative data
What are the two types of variables
Qualitative / Categorical or Quantitative / Numerical
What are qualitative / categorical variables/data?
Qualitative are non-numeric , and includes info like verbal responses to open ended questions which cannot be valued numerically.
Categorical data is a form of qualitative data that can be grouped into categories instead of measured numerically
The answers are typically in words. If the answer is in words –> categorical
What is an example of categorical data?
WHat is your gender? –> male or female
What are quantitative / numerical variables?
It’s value will always be in a number form.
The answers are typically in numbers
Data expressed in numbers
What are examples of numerical data?
age and income
What are the two types of numerical data?
Discrete and Continuous