Term Glossary (Topic 2.1) Flashcards
State the six common data types supported in R
1) Lists
2) Vectors
3) Arrays
4) Trees
5) Data Frames
6) Collections
Define ‘Lists’ when referring to common data types in R
Groups of data that can have different formats.
Define ‘Vectors’ when referring to common data types in R
Groups of data with the same data type.
Define ‘Arrays’ when referring to common data types in R
Like Vectors but can span multiple dimensions.
Define ‘Trees’ when referring to common data types in R
Hierarchical groups of data.
Define ‘Data Frames’ when referring to common data types in R
Tables of data (similar to classic database tables).
Define ‘Collections’ when referring to common data types in R
Used to Store Key: Value pairs.
State the four main data handling methods
1) Searching and sorting
2) Grouping
3) Filtering
4) Modelling
Define ‘ Searching and Sorting’ when referring to the main data handling methods
Data sorting involves reordering the data according to given column(s). Searching involves identifying a value of interest. This value may be a primary key that matches with a foreign key from another table.
Define ‘Grouping’ when referring to the main data handling methods
This another term for data aggregation which involves numerous values being categorised together according to a common value from other column(s). These values are then summarised into a single value such as a count, average, min, max… etc.
Define ‘Filtering’ when referring to the main data handling methods
The process of identifying a subset of data that satisfies a given condition.
Define ‘Modelling ‘ when referring to the main data handling methods
Placing your data into a database model that best enables analysis. This may involve denormalization (joining relevant data together prior to analysis).
State six different ways in which errors can occur within datasets
1) Missing data
2) Inconsistent data
3) Redundant data
4) Invalid data
5) Data out of range
6) Outliers
Describe ‘missing data’ when referring to the ways errors can occur in datasets
such as Nulls, missing rows (gives incomplete data)
Describe ‘Inconsistent data’ when referring to the ways errors can occur in datasets
equivalent data stored in different locations is not consistent or is stored in different formats (gives inconsistent data)