Data Flashcards

1
Q

First Party Data

A

Data collected by an individual or group using their own resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Second Party Data

A

Data collected from a group from its audience and then sold.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Third Party Data

A

Data collected from outside sources who did not collect it directly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Population

A

All possible data values in a certain dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sample

A

A part of a population that is representative of the population. This is useful when looking to analyse data on an entire population as collecting such a massive amount of data would be challenging.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Time Series Data

A

This is data that includes dates and is useful when looking to analyse trends over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Qualitative Data

A

This is data that can’t be counted, measured or easily expressed using numbers. Examples include names, categories and descriptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Quantitative Data

A

This is data which can be measured, counted and then expressed as a number. This is data with a certain quantity, amount or range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Discrete Data [Quantitative Data]

A

This is data that’s counted or has a limited number of values. It’s not fractional data composed of whole numbers or points such as 10, 50 and 365.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Continuous Data [Quantitative Data]

A

Data that is measured and can have almost any numerical value. An example is 110.0356 minutes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Nominal Data [Qualitative Data]

A

A type of qualitative data that’s categorized without a set order. This type of data doesn’t have a sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ordinal Data [Qualitative Data]

A

This is a type of qualitative data with a set order or scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Internal Data

A

Data that lives within a company’s own systems. This is usually more reliable and easier to collect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

External Data

A

Data that lives and is generated outside of an organisation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Structured Data

A

Data that’s organised in a certain format such as rows and columns. Spreadsheets and relational databases can stored data in a structured way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Unstructured Data

A

This is data that is not organised in any easily identifiable manner such as audio and video files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Data Model

A

A model that is used for organising data elements and how they relate to one another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Data Elements

A

Pieces of information, such as people’s names, account numbers, and address.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Data Modelling

A

Data modelling is the process of creating diagrams that visually represent how data is organised and structured. These visual representations are called data models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

3 Most Common Types of Data Modelling

A
  1. Conceptual data modelling
  2. Logical data modelling
  3. Physical data modelling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Conceptual Data Modelling

A

Conceptual data modelling gives a high-level view of the data structure, such as how data interacts across an organisation. A conceptual data model doesn’t contain technical details.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Logical Data Modelling

A

Logical data modelling focuses on the technical details of a database such as relationships, attributes, and entities. It doesn’t spell out the actual names of database tables. That’s the job of a physical data model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Physical Data Modelling

A

Physical data modelling depicts how a database operates. A physical data model defines all entities and attributes used; for example, it includes table names, column names, and data types for the database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Spreadsheet Data Types

A
  1. Number
  2. Text or string
  3. Boolean [a result that can only have one of two possible values: true or false.]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Record

A

This is data contained in a row.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Field

A

This is data contained in a column.

27
Q

Wide Data

A

Data in which every data subject has a single row with multiple columns to hold the values of various attributes of the subject.

28
Q

Long Data

A

Long data is data in which each row is one time point per subject, so each subject will have data in multiple rows.

29
Q

Data Transformation

A

This is the process of changing the data’s format, structure, or values.

30
Q

Data Transformation Involves:

A

Adding, copying, or replicating data
Deleting fields or records
Standardising the names of variables
Renaming, moving or combining columns in a database
Joining one set of data with another
Saving a file in a different format. For example, saving a spreadsheet as a comma separated values (CVS) file.

31
Q

Goals For Data Transformation:

A

Data organisation: better organised data is easier to use
Data compatibility: different applications or systems can then use the same data
Data migration: data with matching formats can be moved from one system to another
Data merging: data with the same organisation can be merged together
Data enhancement: data can be displayed with more detailed fields
Data comparison: applies-to-apples comparisons of the data can be made

32
Q

Kaggle

A

This is an online community of people passionate about data.

33
Q

Bias

A

A preference in favour of or against a person, group or people, or a thing.

34
Q

Data Bias

A

Type of error that systematically skews results in a certain direction.

35
Q

Sampling Bias

A

When a sample isn’t representative of the population as a whole.

36
Q

Observer Bias (experimenter or research bias)

A

The tendency for different people to observe things differently.

37
Q

Interpretation Bias

A

The tendency to always interpret ambiguous situations in a positive or negative way.

38
Q

Confirmation Bias

A

The tendency to search for or interpret information in a way that confirms pre-existing beliefs.

39
Q

R.O.C.C.C Process

A

This process can be used to identify good data sources.

Reliable - Good data sources are reliable.
Original - Be sure to validate data with the original source.
Comprehensive - The best data sources contain all critical information needed to answer the question or find a solution.
Current - The usefulness of data decreases as time passes. The best data sources are current and relevant to the task at hand.
Cited - Who created the dataset? Is it part of a credible organisation? When was the data last refreshed? Your source has to be cited and vetted.

40
Q

Ethics

A

Well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benefits to society, fairness, or specific virtues.

41
Q

Data Ethics

A

Well-founded standards of right and wrong that dictate how data is collected, shared, and used.

42
Q

Aspects of data ethics

A

Ownership
Transaction transparency
Consent
Currency
Privacy
Openness

43
Q

Ownership

A

Individuals own the raw data they provide and they have primary control over it’s usage, how it’s processed, and how it’s shared.

44
Q

Transaction Transparency

A

All data-processing activities should be completely explainable and understood by the individual who provides their data.

45
Q

Consent

A

An individual’s right to know explicit details about how and why their data will be used before agreeing to provide it.

46
Q

Currency

A

Individuals should be aware of financial transactions resulting from the use of their personal data and the scale of these transactions.

47
Q

Privacy

A

Preserving a data subject’s information and activity anytime a data transaction occurs.

48
Q

Data Anonymisation

A

Data anonymisation is the process of protecting people’s private or sensitive data by eliminating that kind of information. Typically, data anonymisation involves blanking, hashing, or masking personal information, often by using fixed-length codes to represent data columns, or hiding data with altered values.

49
Q

De-identification

A

This is the process used to wipe data clean of all PII. This is commonly done in the healthcare and financial industries.

50
Q

Data that’s often anonymised includes:

A

Telephone numbers
Names
License plates and license numbers
Social security numbers
IP addresses
Medical records
Email addresses
Photographs
Account numbers

51
Q

Openness (or open data)

A

Free access, usage and sharing of data

52
Q

CSV

A

Comma Separated Values are files that save data in a table format. They use plain text and delineated by characters, such as a comma.

53
Q

Sorting Data

A

Arranging data into a meaningful order to make it easier to understand, analyse, and visualise.

54
Q

Data Governance

A

A process to ensure the formal management of a company’s data assets.

55
Q

Filtering

A

Showing only the data that meets a specific criteria while hiding the rest.

56
Q

Multiple Criteria Sorting

A

This allows you to sort multiple rows at the same time.

57
Q

Best Practices for Organising Data

A
  1. Naming conventions
  2. Foldering
  3. Archiving older files
  4. Aligning your naming and storage practices with your team
  5. Developing metadata practices
58
Q

Naming Coventions

A

Consistent guidelines that describe the content, date, or version of the file in its name.

59
Q

Data Security

A

Protecting data from unauthorised access or corruption by adopting safety measures.

60
Q

Encryption

A

Encryption uses a unique algorithm to alter data and make it unusable by users and applications that don’t know the algorithm. This algorithm is saved as a key which can be used to reverse the encryption.

61
Q

Tokenisation

A

This process replaced the data elements you want to protect with randomly generated data referred to as a token. The original data is stored in a separate location and mapped to the tokens.

62
Q

Mentor

A

A professional who shares their knowledge, skills and experience to help you develop and grow. A mentor elps you to skill up.

63
Q

Sponsor

A

A professional advocate who’s committed to moving a sponsee’s career forward within an organisation. A sponsor helps you to move up.