Course-3 Prepare data for explorations Flashcards

1
Q

Prepare Phase

A

1) Understanding the different types of data and data structures.
2) What type of data is suitable for the question you answering?
3) Practical skills in extracting, organising and protecting your data.
4) How data is generated and collected.
5) Different formats, types, and structures of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is data collected?

A

1) Interviews
2) Observations
3) Forms
4) Questionnaires
5) Surveys
6) Cookies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data collection considerations

A

1) How will the data be collected
2) Choose data sources
3) decide what data to use
4) How much data to collect
5) Select the correct data type
6) Determine the time frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

First-party data

A

Data is collected by an individual or group using their resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Second-party data

A

Data collected by a group directly from its audience and then sold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Third-party data

A

Data was collected from outside sources who did not collect it directly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Population

A

All possible data values in a certain dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sample

A

A part of a population that is representative of the people.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Discrete data

A

Data that is counted and has a limited number of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Continuous data

A

Data that is measured and can have almost any numeric value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Nominal data

A

A type of qualitative data that is categorized without a set order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ordinal data

A

A type of qualitative data with a set order or scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Internal data

A

Data that lives within a company’s own systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

External data

A

Data that lives and is generated outside of an organisation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Structured data

A

Data is organised in a specific format, such as rows and columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Examples of software that store structured data.

A

Spreadsheets, Relational databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Unstructured data

A

Data that is not organised in any easily identifiable manner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Examples of unstructured data

A

Audio files, Video files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Primary data

A

Collected by a researcher from first-hand sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Example of primary data

A

Data from an interview you conducted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Secondary data

A

Gathered by other people or from further research.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Example of secondary data

A

demographic data collected by a university

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Internal data

A

Data that lives inside a company’s own systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

example of internal data

A

Sales data by store location.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

External data

A

Data that lives outside of a company or organisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

example of external data

A

National average wages for t he various positions throughout your organisation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Continuous data

A

Data that is measured and can have almost any numeric value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Continuous data example

A

1) Temperature
2) Runtime markers in a video

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Discrete data

A

Data that is counted and has a limited number of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Example of discrete data

A

Number of people who visit a hosptal on a daily basis (10,20,200)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Qualitative data

A

Subjective and explanatory measures of qualities and characteristics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Example Qualitative data

A

Excercise activity most enjoyed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Quantitative data

A

Specfic and objective measures of numerical facts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Quantitative data example

A

Population of elephants in Africa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Nominal data

A

A type of qualitative data that isn’t categorized with a set order,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Nominal data example

A

New listing, reduced price listing, foreclosure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Ordinal data

A

A type of qualitative data with a set order or scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Ordinal data example

A

Income level ( low income, middle income, high income)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Structured data

A

Data is organised in a specific format, like rows and columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

structured data example

A

Expense reports

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Unstructured data

A

Data that isn’t organised in any easily identifiable manner.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Unstructured data example

A
  • Social media posts
  • Emails
  • Videos
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Data Model

A

A model that is used for organising data elements and how they relate to one another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Data elements

A

Pieces of information, such as people’s names, account numbers, and addresses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Sources of structured data

A

1) Spreadsheets
2) Databases that store datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Data modelling

A

Data modelling is creating diagrams visually representing how data is organised and structured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Levels of data modelling

A

1) Conceptual ( Business concepts)
2) Logical ( Data entities)
3) Physical ( Physical tables)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Conceptual data modelling

A

Conceptual data modelling gives a high- view of the data structure, such as how data interacts across an organisation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Example Conceptual data modelling

A

A conceptual data model may be used to define the business requirement for a new database. A conceptual data model doesn’t contain technical details.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Logical data modelling

A

Logical data modelling focuses on the technical details of a database, such as relationships, attributes, and entities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Logical data modelling example

A

For example, a logical data model defines how individual records are uniquely identified in a database. But it doesn’t spell out the actual names of database tables. That’s the job of a physical data model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Physical data modelling

A

Physical data modelling depicts how a database operates. A physical data model defines all entities and attributes used;

53
Q

Physical data modelling example

A

For example, it includes the database’s table names, column names, and data types.

54
Q

Data Type

A

A specific kind of data attribute that tells what kind of value the data is

55
Q

Data types in spreadsheets

A

1) Number
2) Text or string
3) Boolean

56
Q

Text or string data type

A

A sequence of characters and punctuat ion that contains textual information.

57
Q

Boolean data type

A

A data type with only two possible values, such as TRUE or FALSE.

58
Q

Table definitions

A

Rows- Records
Columns- Fields

59
Q

Wide data

A

Data in which every data subject has a single row with multiple columns to hold the values of various attributes of the issue.

60
Q

Long data

A

Data in which each row is a one-time point per subject, so each subject will have data in multiple rows.

61
Q

Data transformation

A

Data transformation is the process of changing the data’s format, structure, or values.

62
Q

Data transformation example

A

-Adding, copying, or replicating data
-Deleting fields or records

63
Q

Goals for Data Transformation

A
  • Data organisation: better-organised data is easier to use.
  • Data compatibility: different applications or systems can use the same data.
  • Data migration: data with the same formats can be moved from one system to another.
    -Data merging: Data with the same organisation can be merged.
  • Data enhancement: data can be displayed with more details fields.
  • Data comparison: apples-to-apples comparisons of the data can then be made.
64
Q

Wide data is preferred when

A
  • Creating tables and charts with a few variables about each subject.
  • Comparing straightforward link graphs.
64
Q

Wide data is preferred when

A
  • Creating tables and charts with a few variables about each subject.
  • Comparing straightforward link graphs.
65
Q

Wide data is preferred when

A
  • Creating tables and charts with a few variables about each subject.
  • Comparing straightforward line graphs.
66
Q

Long data is prefered when

A
67
Q

Bias

A

A preference in favour of or against a person group of people, or thing.

68
Q

Data Bias

A

A type of error that systematically skews results in a certain direction.

69
Q

Sampling bias

A

When a sample isn’t representative of the population as a whole.

70
Q

Unbiased sampling

A

When a sample is representative of the population being measured.

71
Q

More types of data bias

A

1) Observer bias
2) Interpretation bias
3) Confirmation bias

72
Q

Observer bias (Experiment bias/ research bias)

A

The tendency for different people to observe things differently.

73
Q

Interpretation bias

A

The tendency to always interpret ambiguous situations in a positive or negative way.

74
Q

Confirmation bias

A

The tendency to search for or interpret information in a way that confirms pre-existing beliefs.

75
Q

Way to find good data sources

A

ROCCC- Reliable, Original, Comprehensive, Current, Cited

76
Q

Ethics

A

Well-founded standards of right and wrong prescribe what humans ought to do, usually regarding rights, obligations, benefits to society, fairness, or specific virtues.

77
Q

Data ethics

A

Well-founded standards of right or wrong dictate how data is collected, shared and used.

78
Q

GDPR

A

General Data Protection Regualtion of the European Union

79
Q

Aspects of data ethics

A

1) Ownership
2) Transaction transparency
3) Consent
4) Currency
5) Privacy
6) Openness

80
Q

Ownership

A

Individuals own the raw data they provide and they have primary control over its usage, how it’s processed, and how it’s shared.

80
Q

Ownership

A

Individuals own the raw data they provide and they have primary control over its usage, how it’s processed, and how it’s shared.

81
Q

Transaction transparency

A

All data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data.

82
Q

Consent

A

An Individual’s the right to know explicit details about how and why their data will be used before agreeing to provide it.

83
Q

Currency

A

Individuals should be aware of financial transactions resulting from the use of their personal data and the scale of these transactions.

84
Q

Privacy

A

Preserving a data subject’s information and activity any time a data transaction occurs.

85
Q

Data Protection examples

A
  • Protection from unauthorised access to our private data.
  • Freedom from inappropriate use of our data.
  • The right to inspect, update or correct our data.
  • Ability to give consent to use our data.
  • Legal right to access the data.
86
Q

Openness

A

Free access, usage, and sharing of data

87
Q

Data interoperability

A

The ability of data systems and services to openly connect and share data.

88
Q

Pixel

A

In digital imaging, a small area of illumination on a display screen that, when combined with other adjacent areas, forms a digital image.

89
Q

Database

A

A collection of data stored in a computer system

90
Q

Metadata

A

Data about data

91
Q

Relational database

A

A database that contains a series of related tables that can be connected via their relationships

92
Q

Primary key

A

An Identifier that references a column in which each value is unique.

93
Q

Foreign Key

A

A filed within a table that is a primary key in another table.

94
Q

Primary key-2

A
  • Used to ensure data in a specific column is unique.
    -Uniquely identifies a record in a relational database table.
  • Only one primary key is allowed in a table.
  • Cannot contain null or blank values.
95
Q

Foreign key-2

A
  • A column or group of columns in a relational database table that provides a link between the data in two tables.
  • Refers to the field in a table that’s the primary key of another table.
  • More than one foreign key is allowed to exist in a table.
96
Q

metadata benefits

A

Metadata is stored in a single, centra l location, and gives the company standardised information about all of its data.

97
Q

3 common types of metadata

A

-Descriptive
-Structural
-Administrative

98
Q

Descriptive Metadata

A

Metadata describes a piece of data and can be used to identify it at a later point in time.

99
Q

Structural metadata

A

Metadata indicates how a piece of data is organised and whether it is part of one, or more than one, data collection.

100
Q

Administrative metadata

A

Metadata that indicates the technical source of a digital asset.

101
Q

Metadata repository

A

A database specifically created to store metadata.

102
Q

Metadata repository Benefits

A
  • Metadata repositories make it easier and faster to bring together multiple sources for data analysis.
103
Q

Metadata repositories functions

A
  • Describe the state and location of the metadata.
  • Describe the structures of the tables inside.
  • Describe how the data flows through the repository
  • Keep a track of who accesses the metadata and when.
104
Q

Data governance

A

A process to ensure the formal management of a company’s data assets.

105
Q

CSV- Comma-seperated values

A

A CSV file saves data in a table format

106
Q

Sorting data

A

Arranging data into a meaningful order to make it easier to understand, analyse and visualise.

107
Q

Filtering

A

Showing only the data that meets specific criteria while hiding the rest.

108
Q

2 types of Bigquery accounts

A
  • Sandbox
  • Free Trial
109
Q

Sandbox

A
  • 12 Projects at a time
  • Cannot insert new records into a database
  • Cannot update field values of existing records
110
Q

Free Trial

A
  • $300 in credit during the first 90 days
  • Select a paid account
  • You will never be automatically charged
111
Q

Fill handle

A

A Box in the lower-right corner of a selected spreadsheet cell can be dragged through neighbouring cells in order to continue instruction.

112
Q

Benefits of organising data

A
  • Makes it easier to find and use
  • Helps you avoid making mistakes during your analysis
  • Helps to protect your data
113
Q

Best practices when organising data

A
  • Naming conventions
  • Foldering
  • Archiving older files
  • Align your naming and storage practices with your team
  • Develop metadata practices
114
Q

Naming conventions

A

-Consistent guidelines that describe the content, date, or version of a file in its name.
- Use logical l and descriptive names for your files to make them easier to find and use.

115
Q

Foldering

A

Organise your files into folders

116
Q

Subfolders

A

Breaking folders down into sub-sections

117
Q

Benefits of foldering

A

Can move old projects to a separate location to create an archive and cut down on clutter.

118
Q

File naming DOs

A
  • Work out your conventions early
  • Align file naming with your team
  • Make sure file names are meaningful
  • Keep file names short and sweet
  • Format dates yyymmdd: SalesReport20201125
  • Lead revision numbers with 0: SalesReport20201125v02
  • Use hyphens, underscores, or capitalised letters: SalesReport_2020_11_25_v02
119
Q

Data security

A

Protecting data from unauthorised access or corruption by adapting safety measures.

120
Q

Encryption

A

Encryption uses a unique algorithm to alter data and make it unusable by users and applications that don’t know the algorithm.

121
Q

Tokenization

A

-Tokenization replaces the data elements you want to protect with randomly data referred to as token.
- The original data is stored in a separate location and mapped to tokens.
- To access the complete original data, the user or application to have permission to use the tokenized data and the tokem mapping.
- This means that even if the tokenized data is hacked, the original data is still safe and secure location.

122
Q

Data interoperability

A

The ability to integrate data from multiple sources and a key factor in the successful use of open data among companies and governments.

123
Q

A professional online presence can

A
  • Help potential employers find you
  • Make connections with other analysts
  • Learn and share data findings
    -Participate in community events
124
Q

Networking

A

Professional realtionship building

125
Q

Mentor

A

A professional who shares their knowledge, skills, and experience to help you develop and grow.

126
Q

Sponsor

A

A professional advocate who’s committed to moving a sponsee’s career forward within an organisation.

127
Q

End of Course-3

A
  • Data types and data structures
  • Bias and Credibility
  • Databases
  • Organising and protecting data
  • The data community