Course-3 Prepare data for explorations Flashcards
Prepare Phase
1) Understanding the different types of data and data structures.
2) What type of data is suitable for the question you answering?
3) Practical skills in extracting, organising and protecting your data.
4) How data is generated and collected.
5) Different formats, types, and structures of data.
How is data collected?
1) Interviews
2) Observations
3) Forms
4) Questionnaires
5) Surveys
6) Cookies
Data collection considerations
1) How will the data be collected
2) Choose data sources
3) decide what data to use
4) How much data to collect
5) Select the correct data type
6) Determine the time frame
First-party data
Data is collected by an individual or group using their resources.
Second-party data
Data collected by a group directly from its audience and then sold
Third-party data
Data was collected from outside sources who did not collect it directly.
Population
All possible data values in a certain dataset.
Sample
A part of a population that is representative of the people.
Discrete data
Data that is counted and has a limited number of values.
Continuous data
Data that is measured and can have almost any numeric value.
Nominal data
A type of qualitative data that is categorized without a set order.
Ordinal data
A type of qualitative data with a set order or scale.
Internal data
Data that lives within a company’s own systems
External data
Data that lives and is generated outside of an organisation.
Structured data
Data is organised in a specific format, such as rows and columns.
Examples of software that store structured data.
Spreadsheets, Relational databases
Unstructured data
Data that is not organised in any easily identifiable manner
Examples of unstructured data
Audio files, Video files
Primary data
Collected by a researcher from first-hand sources.
Example of primary data
Data from an interview you conducted
Secondary data
Gathered by other people or from further research.
Example of secondary data
demographic data collected by a university
Internal data
Data that lives inside a company’s own systems
example of internal data
Sales data by store location.
External data
Data that lives outside of a company or organisation
example of external data
National average wages for t he various positions throughout your organisation.
Continuous data
Data that is measured and can have almost any numeric value
Continuous data example
1) Temperature
2) Runtime markers in a video
Discrete data
Data that is counted and has a limited number of values.
Example of discrete data
Number of people who visit a hosptal on a daily basis (10,20,200)
Qualitative data
Subjective and explanatory measures of qualities and characteristics.
Example Qualitative data
Excercise activity most enjoyed
Quantitative data
Specfic and objective measures of numerical facts
Quantitative data example
Population of elephants in Africa
Nominal data
A type of qualitative data that isn’t categorized with a set order,
Nominal data example
New listing, reduced price listing, foreclosure.
Ordinal data
A type of qualitative data with a set order or scale.
Ordinal data example
Income level ( low income, middle income, high income)
Structured data
Data is organised in a specific format, like rows and columns.
structured data example
Expense reports
Unstructured data
Data that isn’t organised in any easily identifiable manner.
Unstructured data example
- Social media posts
- Emails
- Videos
Data Model
A model that is used for organising data elements and how they relate to one another.
Data elements
Pieces of information, such as people’s names, account numbers, and addresses.
Sources of structured data
1) Spreadsheets
2) Databases that store datasets
Data modelling
Data modelling is creating diagrams visually representing how data is organised and structured.
Levels of data modelling
1) Conceptual ( Business concepts)
2) Logical ( Data entities)
3) Physical ( Physical tables)
Conceptual data modelling
Conceptual data modelling gives a high- view of the data structure, such as how data interacts across an organisation.
Example Conceptual data modelling
A conceptual data model may be used to define the business requirement for a new database. A conceptual data model doesn’t contain technical details.
Logical data modelling
Logical data modelling focuses on the technical details of a database, such as relationships, attributes, and entities.
Logical data modelling example
For example, a logical data model defines how individual records are uniquely identified in a database. But it doesn’t spell out the actual names of database tables. That’s the job of a physical data model.
Physical data modelling
Physical data modelling depicts how a database operates. A physical data model defines all entities and attributes used;
Physical data modelling example
For example, it includes the database’s table names, column names, and data types.
Data Type
A specific kind of data attribute that tells what kind of value the data is
Data types in spreadsheets
1) Number
2) Text or string
3) Boolean
Text or string data type
A sequence of characters and punctuat ion that contains textual information.
Boolean data type
A data type with only two possible values, such as TRUE or FALSE.
Table definitions
Rows- Records
Columns- Fields
Wide data
Data in which every data subject has a single row with multiple columns to hold the values of various attributes of the issue.
Long data
Data in which each row is a one-time point per subject, so each subject will have data in multiple rows.
Data transformation
Data transformation is the process of changing the data’s format, structure, or values.
Data transformation example
-Adding, copying, or replicating data
-Deleting fields or records
Goals for Data Transformation
- Data organisation: better-organised data is easier to use.
- Data compatibility: different applications or systems can use the same data.
- Data migration: data with the same formats can be moved from one system to another.
-Data merging: Data with the same organisation can be merged. - Data enhancement: data can be displayed with more details fields.
- Data comparison: apples-to-apples comparisons of the data can then be made.
Wide data is preferred when
- Creating tables and charts with a few variables about each subject.
- Comparing straightforward link graphs.
Wide data is preferred when
- Creating tables and charts with a few variables about each subject.
- Comparing straightforward link graphs.
Wide data is preferred when
- Creating tables and charts with a few variables about each subject.
- Comparing straightforward line graphs.
Long data is prefered when
Bias
A preference in favour of or against a person group of people, or thing.
Data Bias
A type of error that systematically skews results in a certain direction.
Sampling bias
When a sample isn’t representative of the population as a whole.
Unbiased sampling
When a sample is representative of the population being measured.
More types of data bias
1) Observer bias
2) Interpretation bias
3) Confirmation bias
Observer bias (Experiment bias/ research bias)
The tendency for different people to observe things differently.
Interpretation bias
The tendency to always interpret ambiguous situations in a positive or negative way.
Confirmation bias
The tendency to search for or interpret information in a way that confirms pre-existing beliefs.
Way to find good data sources
ROCCC- Reliable, Original, Comprehensive, Current, Cited
Ethics
Well-founded standards of right and wrong prescribe what humans ought to do, usually regarding rights, obligations, benefits to society, fairness, or specific virtues.
Data ethics
Well-founded standards of right or wrong dictate how data is collected, shared and used.
GDPR
General Data Protection Regualtion of the European Union
Aspects of data ethics
1) Ownership
2) Transaction transparency
3) Consent
4) Currency
5) Privacy
6) Openness
Ownership
Individuals own the raw data they provide and they have primary control over its usage, how it’s processed, and how it’s shared.
Ownership
Individuals own the raw data they provide and they have primary control over its usage, how it’s processed, and how it’s shared.
Transaction transparency
All data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data.
Consent
An Individual’s the right to know explicit details about how and why their data will be used before agreeing to provide it.
Currency
Individuals should be aware of financial transactions resulting from the use of their personal data and the scale of these transactions.
Privacy
Preserving a data subject’s information and activity any time a data transaction occurs.
Data Protection examples
- Protection from unauthorised access to our private data.
- Freedom from inappropriate use of our data.
- The right to inspect, update or correct our data.
- Ability to give consent to use our data.
- Legal right to access the data.
Openness
Free access, usage, and sharing of data
Data interoperability
The ability of data systems and services to openly connect and share data.
Pixel
In digital imaging, a small area of illumination on a display screen that, when combined with other adjacent areas, forms a digital image.
Database
A collection of data stored in a computer system
Metadata
Data about data
Relational database
A database that contains a series of related tables that can be connected via their relationships
Primary key
An Identifier that references a column in which each value is unique.
Foreign Key
A filed within a table that is a primary key in another table.
Primary key-2
- Used to ensure data in a specific column is unique.
-Uniquely identifies a record in a relational database table. - Only one primary key is allowed in a table.
- Cannot contain null or blank values.
Foreign key-2
- A column or group of columns in a relational database table that provides a link between the data in two tables.
- Refers to the field in a table that’s the primary key of another table.
- More than one foreign key is allowed to exist in a table.
metadata benefits
Metadata is stored in a single, centra l location, and gives the company standardised information about all of its data.
3 common types of metadata
-Descriptive
-Structural
-Administrative
Descriptive Metadata
Metadata describes a piece of data and can be used to identify it at a later point in time.
Structural metadata
Metadata indicates how a piece of data is organised and whether it is part of one, or more than one, data collection.
Administrative metadata
Metadata that indicates the technical source of a digital asset.
Metadata repository
A database specifically created to store metadata.
Metadata repository Benefits
- Metadata repositories make it easier and faster to bring together multiple sources for data analysis.
Metadata repositories functions
- Describe the state and location of the metadata.
- Describe the structures of the tables inside.
- Describe how the data flows through the repository
- Keep a track of who accesses the metadata and when.
Data governance
A process to ensure the formal management of a company’s data assets.
CSV- Comma-seperated values
A CSV file saves data in a table format
Sorting data
Arranging data into a meaningful order to make it easier to understand, analyse and visualise.
Filtering
Showing only the data that meets specific criteria while hiding the rest.
2 types of Bigquery accounts
- Sandbox
- Free Trial
Sandbox
- 12 Projects at a time
- Cannot insert new records into a database
- Cannot update field values of existing records
Free Trial
- $300 in credit during the first 90 days
- Select a paid account
- You will never be automatically charged
Fill handle
A Box in the lower-right corner of a selected spreadsheet cell can be dragged through neighbouring cells in order to continue instruction.
Benefits of organising data
- Makes it easier to find and use
- Helps you avoid making mistakes during your analysis
- Helps to protect your data
Best practices when organising data
- Naming conventions
- Foldering
- Archiving older files
- Align your naming and storage practices with your team
- Develop metadata practices
Naming conventions
-Consistent guidelines that describe the content, date, or version of a file in its name.
- Use logical l and descriptive names for your files to make them easier to find and use.
Foldering
Organise your files into folders
Subfolders
Breaking folders down into sub-sections
Benefits of foldering
Can move old projects to a separate location to create an archive and cut down on clutter.
File naming DOs
- Work out your conventions early
- Align file naming with your team
- Make sure file names are meaningful
- Keep file names short and sweet
- Format dates yyymmdd: SalesReport20201125
- Lead revision numbers with 0: SalesReport20201125v02
- Use hyphens, underscores, or capitalised letters: SalesReport_2020_11_25_v02
Data security
Protecting data from unauthorised access or corruption by adapting safety measures.
Encryption
Encryption uses a unique algorithm to alter data and make it unusable by users and applications that don’t know the algorithm.
Tokenization
-Tokenization replaces the data elements you want to protect with randomly data referred to as token.
- The original data is stored in a separate location and mapped to tokens.
- To access the complete original data, the user or application to have permission to use the tokenized data and the tokem mapping.
- This means that even if the tokenized data is hacked, the original data is still safe and secure location.
Data interoperability
The ability to integrate data from multiple sources and a key factor in the successful use of open data among companies and governments.
A professional online presence can
- Help potential employers find you
- Make connections with other analysts
- Learn and share data findings
-Participate in community events
Networking
Professional realtionship building
Mentor
A professional who shares their knowledge, skills, and experience to help you develop and grow.
Sponsor
A professional advocate who’s committed to moving a sponsee’s career forward within an organisation.
End of Course-3
- Data types and data structures
- Bias and Credibility
- Databases
- Organising and protecting data
- The data community