Chapter 2, Understanding Data Flashcards

1
Q

Which common encoding standard (character set) enables use of non-Latin characters?

A

UTF-8

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which data type is used to store status of a flag?

A

The Bit data type

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The author states there are two data-types relating to numbers. what are these and what’s the different between them and what does the author get wrong?

A

Integer - used for whole numbers only
Numeric - used for numbers with a decimal point (rational numbers)
The author uses numeric to describe numbers with a decimal point when this is ambiguous. It should be Floating Point/Decimal and not ‘Numeric’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name the 3 types of rational number datatypes named in the book?
Which ones aren’t used by Microsoft SQL server and MySQL?

A

Decimal, Shortdecimal and number
shortdecimal and number is not used by Microsoft SQL server or MySQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which database discussed in the book is the only one that had datatypes specifically for storing currency?

A

Microsoft SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

For what reason does the currency datatype not really get used?

A

The currency data type uses floating-point numbers as it’s underlying data-type. As they’re stored as binary numbers, they suffer from binary rounding errors.

The book mentions the reason is due to the 4 decimal place limitation, this is nonsense. It could be 10 decimal places, and there would still be rounding errors because Currency is floating-point and stored as binary. However, 4 is still limited as there are many examples in financial calculations where you need more than this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which data type would you pick if you had to store unstructured data in the form of Office files?

A

binary datatype

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Aside from being character/alphanumeric data types, what data structure would large text datatype names longtext,varchar(max) and CLOB fall into?

A

Unstructured - because all of these data-types could hold many paragraphs worth of unstructured text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A ____ _______ is an attribute about a person, place or thing.

A

data element

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What describes the characteristics of activities?

A

Data elements

Review book. This question needs more context.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What kind of data is organized into a table made up of rows and columns?

A

Tabular data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What kind of database is an extension of the tabular model and organizes data across multiple tables instead of having it all in just one table?

A

An RDMS or Relational Database Management System

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What makes structured data structured?

A

1) it is tabular in nature
2) the columns have consistent data types, i.e. Only seeing numbers in the “Weight” column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the most common data-type used to store Characters in a cell?

A

Alphanumeric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe strong typing and weak typing?

A

Strong typing refers to the strictly enforced type of data that is allowed in a database cell. The cell simply won’t accept a data type outside of what is configured. Whereas weak typing refers to what is used in spreadsheets in which they loosely enforce a data type allowed in a given cell. The software will still allow a different data type to be used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Regardless of structure, data is either __________ or ___________

A

Qualitative, Quantitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

General numeric data comes in two forms. What are they called, what data-types are associated with each and what are each used for typically?

A

1) Discrete numbers. Use the Integer datatype. Used when counting something.
and
2) Continuous - use Decimal/Floatin point usually. Used when measuring.

Discrete numbers are always integers and can’t be subdivided into fractions.

18
Q

two examples of data below. Which is discrete and which is continuous? Why?

Number of Dogs - 5
Average Miles walked - 505.5

A

Number of dogs is discrete because you can’t subdivide one dog.
Average miles walked is continuous because you can subdivide a mile.

19
Q

Why can qualitative data only be discrete but quantitative data can be both discrete and continuous?

A

qualitative data can only be discrete because qualities like hair colour, name, sex etc can’t be subdivided whereas quantitative data can be both discrete if specified or continuous if measuring.

20
Q

What are fact tables and what are they used for?

A

Fact tables can be any table that stores measurable facts or data specific to a business area or process. They’re used with dimension tables to help businesses answer questions

21
Q

What are dimension tables and what are they used for?

A

Dimension tables are used in conjunction with Fact tables. They simply help answer questions about ‘fact table’, businesses ask questions about things to do with the fact table.

22
Q

What kind of data describes a data element that divides data into distinct groups?

A

Categorical data.

23
Q

When is data in a column said to be CONSISTENT?

A

When the data is all of the same type of value

24
Q

Data that has structure but is not tabular is what kind of data?

A

semi-structured data

25
Q

When dealing with data, what influences your choice of data and why is your choice of data important?

A

The data values influence your choice of data.
You choice of data is important because it helps boost data quality.

26
Q

The first step in preparing your STRUCTURED data for analysis is doing what?

A

Getting it into tabular format of unique rows and columns.

27
Q

What are the exam essentials to remember from Chapter 2?

A

1) Consider the values of what you will store before selecting data types
2) Know that you can format data after storing it
3) Consider the ABSOLUTE LIMITS of values that you will use before selecting data types
4) Explain the differences between structured and unstructured data

Check the book about the values bit.

28
Q

using categories or categorical data has the added benefit of what?

A

enforcing data quality by restricting input to a finite selection

29
Q

What two things should you consider BEFORE selecting the data-type?

A

1) what the nature of the data is (text, number binary etc)
2) the size of the data and expected range

30
Q

Machine data is a common source of structured/unstructured data?

A

Unstructured

31
Q

what type of data storasge architecture is used to store unstructured data?

A

Object Storage

32
Q

Waht are the largest character data-types in the below databases?
1) Oracle
2) Microsoft SQL Server
3) MySQL

A

1) Oracle - CLOB 128TB
2) Microsoft SQL Server varchar(max) 2GB
3) MySQL - LONGTEXT 4Gb

33
Q

Which Integer data-types store numbers upto 2.14 trillion?

A

int, and integer

34
Q

What are the names of the Integer data-types used to store numbers larger than 2.14 trillion?

A

longinteger and bigint

35
Q

smallint and shortinteger store numbers upto how many?

A

32,768?

36
Q

the data-type name of Timestamp stores what data in Oracle?

A

Date and Time

37
Q

if you want to store just the time in a column, which of the databases below supports that?
1) Oracle
2) Microsoft SQL Server
3) MySQL

A

2) Microsoft SQL Server
3) MySQL

38
Q

List, in order from largest to smallest, the top 3 binary datatypes and their databases

A

1) BLOB - 128TB - Oracle
2) longblob - 4GB - My SQL
3) varbinary(max) - 2GB - Microsoft SQL Server

39
Q

What activity involves ensuring consistent naming conventions for individual fields are applied? What issue can this cause with visualization and reporting?

A

Field Standardization

it can cause odd results

40
Q

another name for field

A

observation