Chapter 2, Understanding Data Flashcards
Which common encoding standard (character set) enables use of non-Latin characters?
UTF-8
Which data type is used to store status of a flag?
The Bit data type
The author states there are two data-types relating to numbers. what are these and what’s the different between them and what does the author get wrong?
Integer - used for whole numbers only
Numeric - used for numbers with a decimal point (rational numbers)
The author uses numeric to describe numbers with a decimal point when this is ambiguous. It should be Floating Point/Decimal and not ‘Numeric’
Name the 3 types of rational number datatypes named in the book?
Which ones aren’t used by Microsoft SQL server and MySQL?
Decimal, Shortdecimal and number
shortdecimal and number is not used by Microsoft SQL server or MySQL
Which database discussed in the book is the only one that had datatypes specifically for storing currency?
Microsoft SQL
For what reason does the currency datatype not really get used?
The currency data type uses floating-point numbers as it’s underlying data-type. As they’re stored as binary numbers, they suffer from binary rounding errors.
The book mentions the reason is due to the 4 decimal place limitation, this is nonsense. It could be 10 decimal places, and there would still be rounding errors because Currency is floating-point and stored as binary. However, 4 is still limited as there are many examples in financial calculations where you need more than this.
Which data type would you pick if you had to store unstructured data in the form of Office files?
binary datatype
Aside from being character/alphanumeric data types, what data structure would large text datatype names longtext,varchar(max) and CLOB fall into?
Unstructured - because all of these data-types could hold many paragraphs worth of unstructured text
A ____ _______ is an attribute about a person, place or thing.
data element
What describes the characteristics of activities?
Data elements
Review book. This question needs more context.
What kind of data is organized into a table made up of rows and columns?
Tabular data
What kind of database is an extension of the tabular model and organizes data across multiple tables instead of having it all in just one table?
An RDMS or Relational Database Management System
What makes structured data structured?
1) it is tabular in nature
2) the columns have consistent data types, i.e. Only seeing numbers in the “Weight” column.
What is the most common data-type used to store Characters in a cell?
Alphanumeric
Describe strong typing and weak typing?
Strong typing refers to the strictly enforced type of data that is allowed in a database cell. The cell simply won’t accept a data type outside of what is configured. Whereas weak typing refers to what is used in spreadsheets in which they loosely enforce a data type allowed in a given cell. The software will still allow a different data type to be used.
Regardless of structure, data is either __________ or ___________
Qualitative, Quantitative
General numeric data comes in two forms. What are they called, what data-types are associated with each and what are each used for typically?
1) Discrete numbers. Use the Integer datatype. Used when counting something.
and
2) Continuous - use Decimal/Floatin point usually. Used when measuring.
Discrete numbers are always integers and can’t be subdivided into fractions.
two examples of data below. Which is discrete and which is continuous? Why?
Number of Dogs - 5
Average Miles walked - 505.5
Number of dogs is discrete because you can’t subdivide one dog.
Average miles walked is continuous because you can subdivide a mile.
Why can qualitative data only be discrete but quantitative data can be both discrete and continuous?
qualitative data can only be discrete because qualities like hair colour, name, sex etc can’t be subdivided whereas quantitative data can be both discrete if specified or continuous if measuring.
What are fact tables and what are they used for?
Fact tables can be any table that stores measurable facts or data specific to a business area or process. They’re used with dimension tables to help businesses answer questions
What are dimension tables and what are they used for?
Dimension tables are used in conjunction with Fact tables. They simply help answer questions about ‘fact table’, businesses ask questions about things to do with the fact table.
What kind of data describes a data element that divides data into distinct groups?
Categorical data.
When is data in a column said to be CONSISTENT?
When the data is all of the same type of value
Data that has structure but is not tabular is what kind of data?
semi-structured data
When dealing with data, what influences your choice of data and why is your choice of data important?
The data values influence your choice of data.
You choice of data is important because it helps boost data quality.
The first step in preparing your STRUCTURED data for analysis is doing what?
Getting it into tabular format of unique rows and columns.
What are the exam essentials to remember from Chapter 2?
1) Consider the values of what you will store before selecting data types
2) Know that you can format data after storing it
3) Consider the ABSOLUTE LIMITS of values that you will use before selecting data types
4) Explain the differences between structured and unstructured data
Check the book about the values bit.
using categories or categorical data has the added benefit of what?
enforcing data quality by restricting input to a finite selection
What two things should you consider BEFORE selecting the data-type?
1) what the nature of the data is (text, number binary etc)
2) the size of the data and expected range
Machine data is a common source of structured/unstructured data?
Unstructured
what type of data storasge architecture is used to store unstructured data?
Object Storage
Waht are the largest character data-types in the below databases?
1) Oracle
2) Microsoft SQL Server
3) MySQL
1) Oracle - CLOB 128TB
2) Microsoft SQL Server varchar(max) 2GB
3) MySQL - LONGTEXT 4Gb
Which Integer data-types store numbers upto 2.14 trillion?
int, and integer
What are the names of the Integer data-types used to store numbers larger than 2.14 trillion?
longinteger and bigint
smallint and shortinteger store numbers upto how many?
32,768?
the data-type name of Timestamp stores what data in Oracle?
Date and Time
if you want to store just the time in a column, which of the databases below supports that?
1) Oracle
2) Microsoft SQL Server
3) MySQL
2) Microsoft SQL Server
3) MySQL
List, in order from largest to smallest, the top 3 binary datatypes and their databases
1) BLOB - 128TB - Oracle
2) longblob - 4GB - My SQL
3) varbinary(max) - 2GB - Microsoft SQL Server
What activity involves ensuring consistent naming conventions for individual fields are applied? What issue can this cause with visualization and reporting?
Field Standardization
it can cause odd results
another name for field
observation