Course-3 Prepare data for explorations Flashcards

1
Q

Prepare Phase

A

1) Understanding the different types of data and data structures.
2) What type of data is suitable for the question you answering?
3) Practical skills in extracting, organising and protecting your data.
4) How data is generated and collected.
5) Different formats, types, and structures of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is data collected?

A

1) Interviews
2) Observations
3) Forms
4) Questionnaires
5) Surveys
6) Cookies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data collection considerations

A

1) How will the data be collected
2) Choose data sources
3) decide what data to use
4) How much data to collect
5) Select the correct data type
6) Determine the time frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

First-party data

A

Data is collected by an individual or group using their resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Second-party data

A

Data collected by a group directly from its audience and then sold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Third-party data

A

Data was collected from outside sources who did not collect it directly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Population

A

All possible data values in a certain dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sample

A

A part of a population that is representative of the people.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Discrete data

A

Data that is counted and has a limited number of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Continuous data

A

Data that is measured and can have almost any numeric value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Nominal data

A

A type of qualitative data that is categorized without a set order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ordinal data

A

A type of qualitative data with a set order or scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Internal data

A

Data that lives within a company’s own systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

External data

A

Data that lives and is generated outside of an organisation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Structured data

A

Data is organised in a specific format, such as rows and columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Examples of software that store structured data.

A

Spreadsheets, Relational databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Unstructured data

A

Data that is not organised in any easily identifiable manner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Examples of unstructured data

A

Audio files, Video files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Primary data

A

Collected by a researcher from first-hand sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Example of primary data

A

Data from an interview you conducted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Secondary data

A

Gathered by other people or from further research.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Example of secondary data

A

demographic data collected by a university

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Internal data

A

Data that lives inside a company’s own systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

example of internal data

A

Sales data by store location.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
External data
Data that lives outside of a company or organisation
26
example of external data
National average wages for t he various positions throughout your organisation.
27
Continuous data
Data that is measured and can have almost any numeric value
28
Continuous data example
1) Temperature 2) Runtime markers in a video
29
Discrete data
Data that is counted and has a limited number of values.
30
Example of discrete data
Number of people who visit a hosptal on a daily basis (10,20,200)
31
Qualitative data
Subjective and explanatory measures of qualities and characteristics.
32
Example Qualitative data
Excercise activity most enjoyed
33
Quantitative data
Specfic and objective measures of numerical facts
34
Quantitative data example
Population of elephants in Africa
35
Nominal data
A type of qualitative data that isn't categorized with a set order,
36
Nominal data example
New listing, reduced price listing, foreclosure.
37
Ordinal data
A type of qualitative data with a set order or scale.
38
Ordinal data example
Income level ( low income, middle income, high income)
39
Structured data
Data is organised in a specific format, like rows and columns.
40
structured data example
Expense reports
41
Unstructured data
Data that isn't organised in any easily identifiable manner.
42
Unstructured data example
- Social media posts - Emails - Videos
43
Data Model
A model that is used for organising data elements and how they relate to one another.
44
Data elements
Pieces of information, such as people's names, account numbers, and addresses.
45
Sources of structured data
1) Spreadsheets 2) Databases that store datasets
46
Data modelling
Data modelling is creating diagrams visually representing how data is organised and structured.
47
Levels of data modelling
1) Conceptual ( Business concepts) 2) Logical ( Data entities) 3) Physical ( Physical tables)
48
Conceptual data modelling
Conceptual data modelling gives a high- view of the data structure, such as how data interacts across an organisation.
49
Example Conceptual data modelling
A conceptual data model may be used to define the business requirement for a new database. A conceptual data model doesn't contain technical details.
50
Logical data modelling
Logical data modelling focuses on the technical details of a database, such as relationships, attributes, and entities.
51
Logical data modelling example
For example, a logical data model defines how individual records are uniquely identified in a database. But it doesn't spell out the actual names of database tables. That's the job of a physical data model.
52
Physical data modelling
Physical data modelling depicts how a database operates. A physical data model defines all entities and attributes used;
53
Physical data modelling example
For example, it includes the database's table names, column names, and data types.
54
Data Type
A specific kind of data attribute that tells what kind of value the data is
55
Data types in spreadsheets
1) Number 2) Text or string 3) Boolean
56
Text or string data type
A sequence of characters and punctuat ion that contains textual information.
57
Boolean data type
A data type with only two possible values, such as TRUE or FALSE.
58
Table definitions
Rows- Records Columns- Fields
59
Wide data
Data in which every data subject has a single row with multiple columns to hold the values of various attributes of the issue.
60
Long data
Data in which each row is a one-time point per subject, so each subject will have data in multiple rows.
61
Data transformation
Data transformation is the process of changing the data's format, structure, or values.
62
Data transformation example
-Adding, copying, or replicating data -Deleting fields or records
63
Goals for Data Transformation
- Data organisation: better-organised data is easier to use. - Data compatibility: different applications or systems can use the same data. - Data migration: data with the same formats can be moved from one system to another. -Data merging: Data with the same organisation can be merged. - Data enhancement: data can be displayed with more details fields. - Data comparison: apples-to-apples comparisons of the data can then be made.
64
Wide data is preferred when
- Creating tables and charts with a few variables about each subject. - Comparing straightforward link graphs.
64
Wide data is preferred when
- Creating tables and charts with a few variables about each subject. - Comparing straightforward link graphs.
65
Wide data is preferred when
- Creating tables and charts with a few variables about each subject. - Comparing straightforward line graphs.
66
Long data is prefered when
67
Bias
A preference in favour of or against a person group of people, or thing.
68
Data Bias
A type of error that systematically skews results in a certain direction.
69
Sampling bias
When a sample isn't representative of the population as a whole.
70
Unbiased sampling
When a sample is representative of the population being measured.
71
More types of data bias
1) Observer bias 2) Interpretation bias 3) Confirmation bias
72
Observer bias (Experiment bias/ research bias)
The tendency for different people to observe things differently.
73
Interpretation bias
The tendency to always interpret ambiguous situations in a positive or negative way.
74
Confirmation bias
The tendency to search for or interpret information in a way that confirms pre-existing beliefs.
75
Way to find good data sources
ROCCC- Reliable, Original, Comprehensive, Current, Cited
76
Ethics
Well-founded standards of right and wrong prescribe what humans ought to do, usually regarding rights, obligations, benefits to society, fairness, or specific virtues.
77
Data ethics
Well-founded standards of right or wrong dictate how data is collected, shared and used.
78
GDPR
General Data Protection Regualtion of the European Union
79
Aspects of data ethics
1) Ownership 2) Transaction transparency 3) Consent 4) Currency 5) Privacy 6) Openness
80
Ownership
Individuals own the raw data they provide and they have primary control over its usage, how it's processed, and how it's shared.
80
Ownership
Individuals own the raw data they provide and they have primary control over its usage, how it's processed, and how it's shared.
81
Transaction transparency
All data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data.
82
Consent
An Individual's the right to know explicit details about how and why their data will be used before agreeing to provide it.
83
Currency
Individuals should be aware of financial transactions resulting from the use of their personal data and the scale of these transactions.
84
Privacy
Preserving a data subject's information and activity any time a data transaction occurs.
85
Data Protection examples
- Protection from unauthorised access to our private data. - Freedom from inappropriate use of our data. - The right to inspect, update or correct our data. - Ability to give consent to use our data. - Legal right to access the data.
86
Openness
Free access, usage, and sharing of data
87
Data interoperability
The ability of data systems and services to openly connect and share data.
88
Pixel
In digital imaging, a small area of illumination on a display screen that, when combined with other adjacent areas, forms a digital image.
89
Database
A collection of data stored in a computer system
90
Metadata
Data about data
91
Relational database
A database that contains a series of related tables that can be connected via their relationships
92
Primary key
An Identifier that references a column in which each value is unique.
93
Foreign Key
A filed within a table that is a primary key in another table.
94
Primary key-2
- Used to ensure data in a specific column is unique. -Uniquely identifies a record in a relational database table. - Only one primary key is allowed in a table. - Cannot contain null or blank values.
95
Foreign key-2
- A column or group of columns in a relational database table that provides a link between the data in two tables. - Refers to the field in a table that's the primary key of another table. - More than one foreign key is allowed to exist in a table.
96
metadata benefits
Metadata is stored in a single, centra l location, and gives the company standardised information about all of its data.
97
3 common types of metadata
-Descriptive -Structural -Administrative
98
Descriptive Metadata
Metadata describes a piece of data and can be used to identify it at a later point in time.
99
Structural metadata
Metadata indicates how a piece of data is organised and whether it is part of one, or more than one, data collection.
100
Administrative metadata
Metadata that indicates the technical source of a digital asset.
101
Metadata repository
A database specifically created to store metadata.
102
Metadata repository Benefits
- Metadata repositories make it easier and faster to bring together multiple sources for data analysis.
103
Metadata repositories functions
- Describe the state and location of the metadata. - Describe the structures of the tables inside. - Describe how the data flows through the repository - Keep a track of who accesses the metadata and when.
104
Data governance
A process to ensure the formal management of a company's data assets.
105
CSV- Comma-seperated values
A CSV file saves data in a table format
106
Sorting data
Arranging data into a meaningful order to make it easier to understand, analyse and visualise.
107
Filtering
Showing only the data that meets specific criteria while hiding the rest.
108
2 types of Bigquery accounts
- Sandbox - Free Trial
109
Sandbox
- 12 Projects at a time - Cannot insert new records into a database - Cannot update field values of existing records
110
Free Trial
- $300 in credit during the first 90 days - Select a paid account - You will never be automatically charged
111
Fill handle
A Box in the lower-right corner of a selected spreadsheet cell can be dragged through neighbouring cells in order to continue instruction.
112
Benefits of organising data
- Makes it easier to find and use - Helps you avoid making mistakes during your analysis - Helps to protect your data
113
Best practices when organising data
- Naming conventions - Foldering - Archiving older files - Align your naming and storage practices with your team - Develop metadata practices
114
Naming conventions
-Consistent guidelines that describe the content, date, or version of a file in its name. - Use logical l and descriptive names for your files to make them easier to find and use.
115
Foldering
Organise your files into folders
116
Subfolders
Breaking folders down into sub-sections
117
Benefits of foldering
Can move old projects to a separate location to create an archive and cut down on clutter.
118
File naming DOs
- Work out your conventions early - Align file naming with your team - Make sure file names are meaningful - Keep file names short and sweet - Format dates yyymmdd: SalesReport20201125 - Lead revision numbers with 0: SalesReport20201125v02 - Use hyphens, underscores, or capitalised letters: SalesReport_2020_11_25_v02
119
Data security
Protecting data from unauthorised access or corruption by adapting safety measures.
120
Encryption
Encryption uses a unique algorithm to alter data and make it unusable by users and applications that don't know the algorithm.
121
Tokenization
-Tokenization replaces the data elements you want to protect with randomly data referred to as token. - The original data is stored in a separate location and mapped to tokens. - To access the complete original data, the user or application to have permission to use the tokenized data and the tokem mapping. - This means that even if the tokenized data is hacked, the original data is still safe and secure location.
122
Data interoperability
The ability to integrate data from multiple sources and a key factor in the successful use of open data among companies and governments.
123
A professional online presence can
- Help potential employers find you - Make connections with other analysts - Learn and share data findings -Participate in community events
124
Networking
Professional realtionship building
125
Mentor
A professional who shares their knowledge, skills, and experience to help you develop and grow.
126
Sponsor
A professional advocate who's committed to moving a sponsee's career forward within an organisation.
127
End of Course-3
- Data types and data structures - Bias and Credibility - Databases - Organising and protecting data - The data community