Data Analysis Concepts Term Glossary Flashcards
Describe ‘Structured data’
Data that is coded in a manner that makes it easily converted into a form usable for data analysis
Describe ‘Semi-Structured’ data
With a single formatting scheme, enabling description of the data (like xml). Can be parsed but data may need to be wrangled (re-formatted)
Describe ‘quasi-structured data’
A little structure but may include multiple formats. Can be formatted with considerable effort
Describe ‘unstructured data’
Data that is more complex (contains various formats and data types) and possibly stored in a format that is not easily decoded.
State the three categories that ‘Qualitative’ data can be
Binomial/Binary - Two exclusive groups i.e Yes/No, Pass/Fail, True/False
Nominal - Multiple groups with no distinct ordering (such as Regions, hair colour, blood groups)
Ordinal - Similar to nominal but with an intrinsic order i.e satisfaction levels, salary bands)
State the two categories that ‘Quantitative’ data can be
Continuous - Decimal numbers measured to a higher precision, i.e heights, speeds, distances, time
Discrete - Normally whole numbers such as counts, ranks, indexes
Define ‘Open’ data
Available in a machine-readable format without restrictions over the ability to use, consume, or share the information.
Define ‘public’ data
Available to the public to collect or look at, but it’s not easily redistributed (or machine readable) and sometimes not easily obtained.
Define ‘proprietary’ data
Data whose ownership is claimed by a specific entity or company. It may be protected under copyright, patent, or trade secret laws.
Define ‘Operational’ data
Data used in the day-to-day business operations Examples: Information on direct competitors, information on suppliers, accounting data and projection of needed resources.
Define ‘Administrative’ data
collected to produce management information. Used to guide future actions but not strictly necessary for the immediate operation of a business.
What does ‘RTF’ stand for, when referring to file types
Rich text format
What does ‘XML’ stand for, when referring to file types
Extensible Mark-up Language
What does ‘JSON’ stand for, when referring to file types
JavaScript Object notation
Describe the structure of a JSON file
Uses Key:Value pairs that are stored in lists, records, and sub-records.
State and describe the three ‘V’s when referring to Big Data
High Volume - can be Peta/Exabytes and distributed across multiple computers/servers
Large Variety - Contains many different data types
Fast Velocity - Data ingestion and data creation is rapid
State the five stages of the data life cycle in order
1) Creation
2) Initial Storage
3) Archiving
4) Obsolete
5) Deleted
Describe the ‘Creation’ stage in the data life cycle
The data is created, either by measurement or collection.
Describe the ‘Initial storage’ stage in the data life cycle
The data is organised and stored to make analysis easier
Describe the ‘Archiving’ stage in the data life cycle
The data is archived with summary data to help future research