all things data Flashcards

1
Q

what is data ?

A

Data is the raw and unprocessed facts that we capture according to some agreed-upon standards. Data could be a number, an image, an audio clip, a transcription, or similar.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is information ?

A

Information is data that has been processed, aggregated, and organized into a more human-friendly format. Data visualizations, reports and dashboards are common ways to present information. (facts revealed by data fitted with context)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is insight ?

A

Insight is gained by analyzing data and information in order to understand the context of a particular situation and draw conclusions. Those conclusions lead to actions you can apply to your business

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the goal of data management ?

A

enable an organization to get more value from its data, Successfully being able to share, store, protect and retrieve data can be the competitive advantage. It helps to mitigate risks and enables decision making in organizations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

COSTS OF POOR DATA MANAGEMENT:

A
  • Misinterpretation of data
  • Lost data
  • Inaccessible data
  • Wasted time and money
  • Missed deadlines
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

which data managements activities are there

A
  • Governance activities
  • Lifecycle activities
  • Foundational activities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

GOVERNANCE ACTIVITIES

A

= Help control data development and reduce risks associated with data use, while at the same time, enabling an organization to leverage data strategically. The purpose of data governance is to ensure that data is managed properly, according to policies and best practices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do you need to define A DATA STRATEGY

A
  • Setting data policies
  • Data stewardship
  • Data ownership
  • Data valuation
  • Data maturity assessment
  • Data classification
  • Installing a cultural change
  • Principles & ethics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Break down the data strategy develoment in 4 key stages

A



  1. Identify
    strategic business goals and align planned data initiatives with them
  2. Assess
    the current state and maturity of your data management environment
  3. Propose new capabilities, processes and technologies to meet business needs
  4. Plot out an implementation roadmap and an internal communication plan
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is data stewardship

A

Data stewardship refers to the management and oversight of an organization’s data. This includes ensuring the quality, accuracy, and security of the data, as well as ensuring that policies and procedures are in place to protect the data. Data stewards are responsible for overseeing the data and ensuring that it is being used appropriately, but they don’t own it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

data ownership

A

Data ownership refers to the individual or group within an organization that is responsible for the data and its use. Data owners are responsible for ensuring that the data is accurate, complete, and protected, and that it is being used in compliance with legal and regulatory requirements. They also have decision making power on how the data is used and shared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

classify data categories

A



Public :
Data that may be freely disclosed to the public
Marketing materials, contact information, price lists

Internal Only :
Internal data not meant for public disclosure
Battlecards, sales playbooks, Organizational charts

Confidential :
Sensitive data that if compromised could negatively affect operations
Contracts with vendors, employee reviews

Restricted :
Highly sensitive corporate data that if compromised could put the organization at financial or legal risk.
IP, credit card information, social security numbers, PHI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are lifecycle activities ?

A

Lifecycle activities refer to the various stages that data goes through from its creation to its disposal. These stages include data collection, data processing, data storage, data analysis, data visualization, data archiving, and data deletion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is plan&design , enable&maintain, use&enhance

A

Plan & design” involves determining the specific data requirements and goals for a project and creating a plan to achieve those goals, including data governance policies and technical infrastructure.

“Enable & maintain” ensures that the data is accurate, accessible, and protected, and manages the day-to-day operations of the data management system.

“Use & enhance” leverages the data for its intended purpose and continuously monitors and evaluates its effectiveness, identifying opportunities to improve or enhance data-driven processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what are foundational activites

A

Foundational activities refer to the basic tasks and processes that organizations must undertake in order to establish a solid foundation for data management. These activities are essential for ensuring the quality, accuracy, and security of the data and are typically the starting point for any data management initiative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

for a good foundation we need :

A
  • Data quality
  • Data protection & security
  • Risk management
  • Data privacy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

explain data quality and gigo

A

GIGO is an acronym for “garbage in, garbage out.” It is a principle that states that if the input data to a system is inaccurate or of poor quality, then the output from that system will also be inaccurate or of poor quality. In other words, if the data that is being used as an input is not accurate or reliable, the output will not be accurate or reliable either. This principle applies to a wide range of systems, including computer systems, data analysis, decision-making processes, and many others.

Data quality= If the data meets the expectations and needs of data consumers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Dimensions of data quality (data quality framework)

A

Is there enough data?
 Data completeness: The proportion of data stored against the potential 100%
Is the data correct?
 Data accuracy: The degree to chich the data correctly describes the ‘real world’ object / event
 Data validity: Data is valid if it conforms to the syntax (format, type, range) of its definition
How well does the data fit together?
 Data consistency: The absence of differences
 Data duplication / uniqueness: Data is not unwanted duplicated within or across systems
Is the data up-to-date?
 Data timeliness: This dimension refers to the relevance of the data in relation to the time it was collected or the time it is used. Data timeliness is important because it ensures that the data is relevant and useful for the intended purpose.

18
Q

Impact of poor data quality linked to dimensions

A

Completeness: Poor data quality in terms of completeness can lead to missing or incomplete information, resulting in inaccurate or unreliable analysis and decision making.

Accuracy: Poor data quality in terms of accuracy can lead to errors and inconsistencies in the data, resulting in incorrect conclusions and decisions.

Timeliness: Poor data quality in terms of timeliness can lead to decisions being based on outdated information, resulting in missed opportunities or wasted resources.

Consistency: Poor data quality in terms of consistency can lead to confusion and misinterpretation of the data, resulting in inconsistent conclusions and decisions.

Validity: Poor data quality in terms of validity can lead to invalid conclusions and decisions, resulting in wasted resources and potential legal and regulatory issues.

Uniqueness: Poor data quality in terms of uniqueness can lead to data duplication, resulting in inconsistent and unreliable analysis, and also inefficiency in the data management process.

19
Q

how can you manage data quality for data entry

A

Establish clear guidelines for data entry
- Use of capital letters, special symbols, numbers
- Define required fields
- Make sure the syntax is followed e.g. dates
- Automate data entry / calculations
- Give options to change data

20
Q

how can you manage data quaity with validation techniques

A

Data validation = process of
checking data for accuracy &
completeness
 Data profiling: define what is critical
data to be complete & accurate
 Data cleansing: remove any errors
and discrepancies
 Data matching and deduplication:
compare data and look for similar
data
Examples:
* Check if it’s the correct data type /
format
* Check if it is a value from a list of
accepted values
* Check if date is in a specified range
* Check for consistent expressions
e.g. begin & end date

21
Q

how can you avoid data silo’s to prevent unnecessary data duplication:

A

Avoid data silos to prevent unnecessary data duplication:
* Centralize data in e.g. a data warehouse or data lake
* Create a data transfer strategy + focus on people, processes
& tools
* Develop a unified view of all your data (data dictionaries,
data models,…)
* Focus on a cloud strategy

22
Q

how can you keep data up to date ?

A

 Centralize your data, you can ensure that all data is stored in one location, making it easier to manage and update.
 Train employees on how to properly enter, manage, and update data is also important.
 Incentivize customers & employees to provide accurate and up-to-date data can also be an effective strategy
 Integrate open source data, Open-source data can provide additional information and context that can help to improve the accuracy and completeness of your data.
 Keep time stamps of data adjustments, it helps to track the data changes over time, and it makes it easier to identify any issues or errors in the data.
 Work with multiple opt-ins , you can ensure that your data is accurate, up-to-date, and reliable, and you can also get different perspectives and insights on the data.

23
Q

what can effective data protection policies and procedures do and what is the goal of it?

A

= Effective data protection policies and procedures allow the right people to use and update data in the right way, and restrict all inappropriate access and updates.
Goals:
* Access control
* Compliance
* Ensuring that stakeholder requirements for privacy and confidentiality are met

24
Q

what are the essential elements of a data security policy ?

A



  1. Data privacy
  2. Password management
  3. Internet usage
  4. Email usage
  5. Company-owned devices
  6. Employee-owned mobile devices
  7. Social media
  8. Software copyright and licensing
  9. Security incident reporting
25
Q

what is an anti virus software ?

A
  • is a computer program used to prevent, detect, and remove malware.
26
Q

authentication

A

– is a process that ensures and confirms a user’s identity. 2-factor authentication is a process in which you ensure the user’s identity in 2 different ways before they get access.

27
Q

Backup ?

A

To make a copy of data storeed on a computer or server to reduce the potential impact of failure or loss.

28
Q

Firewall ?

A

A firewall is a tool that helps to protect a computer or network from unauthorized access by blocking certain incoming and outgoing connections. It acts as a barrier between a trusted internal network and untrusted external network, such as the internet. It can be thought of as a virtual gatekeeper that controls which data packets are allowed to enter or leave the network.

29
Q

honeypot ?

A

Honeypot – A decoy system or network that serves to attract potential attackers.

30
Q

explain data lingo

A



Ransomware = disables victim’s access to data until ransom is paid

Fileless Malware = makes changes to files that are native to the OS

Spyware = collects user activity data without their knowledge

Adware = serves unwanted advertisements

Trojans = disguises itself as desirable code

Worms = spreads through a network by replicating itself

Rootkits = gives hackers remote control of a victim’s device

Keyloggers = monitors users’ keystrokes

Bots = launches a broad flood of attacks

Mobile Malware = infects mobile devices

Wiper Malware = A wiper is a type of malware with a single purpose: to erase user data beyond recoverability

Penetration testing = (also called pen testing) is the practice of testing a computer system, network or Web application to find vulnerabilities that an attacker could exploit.

31
Q

how can malware enters your organisation?

A

 Phishing emails
 File attachments
 USB sticks
 Compromised websites
 RDP (Remote desktop protocol)
 Stolen credentials & compromised accounts

32
Q

what is risk management

A

= the process of identifying, assessing and controlling threats to an organization’s capital and earnings
Prevention is better than cure => Risk scenario
A risk scenario is: a description of a possible event that, when occurring, will have an uncertain impact on the achievement of the enterprise’s objectives. The impact can be positive or negative.

33
Q

what are GENERIC RISK SCENARIOS FOR INFORMATION

A

 Backup media is lost or backups are not checked for effectiveness
 Sensitive information is accidentally disclosed
 Sensitive information is disclosed through e-mail or social media
 Sensitive data is lost / disclosed through logical attacks
 Data is modified intentionally
 IP is lost and / or competitive information is leaked due to key team members leaving the enterprise
 The enterprise has an overflow of data and cannot deduct the business relevant information from the data (e.g., big data problem).

34
Q

Which main categories of data are out their?

A

Reporting: is data organized for the purpose of reporting and business intelligence. Reporting data is created from transactional data, master data, and master reference data.
Transactional: describes business events. It is the largest volume of data in the enterprise
Master: is key business information that supports the transactions. The data on the products & customers supports the transaction
Reference: is a subset of master data that refers to the data that defines the set of permissible values to be used by other data fields
Metadata: is data that describes other data; it is the underlying definition or description of data.

35
Q

what is master data

A

Master data is a type of data that is considered the “single source of truth” and is used to identify and describe core business entities. It is data that is used consistently across an organization and is considered to be critical to the organization’s operations. Examples of master data include:

Customer data: This includes information about customers such as their name, address, and contact information.
Product data: This includes information about products such as their name, description, and price.
Employee data: This includes information about employees such as their name, job title, and salary.

36
Q

what is reference data ?

A

Reference data is a type of data that is used as a reference for other data. It typically includes a set of codes or values that are used to classify or categorize the data. Reference data is usually used in conjunction with transactional data, which is data that is related to a specific transaction or activity.

Examples of reference data include:

A list of valid product codes for a retail organization
A list of valid country codes for an international organization
A list of valid codes for classifying financial transactions
A list of valid codes for classifying medical diagnoses

37
Q

what is a data dictionary

A

= Reference guide on a dataset. The primary goal of a data dictionary is to help data teams understand & trust data assets.Describing the transactional, master & reference data

38
Q

data types

A

 Text
 Number
 Date/timestamp

39
Q

what is data storage

A

= refers to magnetic, optical or solid state media that records and preserves digital information for ongoing or future operations.

40
Q

who manages data storage

A

DBA = Database adminastrator
* Defining storage requirements
* Defining access requirements
* Developing database instances
* Managing the physical storage environment
 Loading data
 Replicating data
 Tracking usage patterns
 Managing backup and recovery
 Database performance and availability
 Data migration
 Enabling Data audits and validation

41
Q

what is data integration & interoperability ?

A

ata integration refers to the process of combining data from multiple sources into a single, unified view. This can be accomplished through a variety of techniques such as data warehousing, ETL (extract, transform, load) processes, and data federation. The goal of data integration is to make it easier for users to access and analyze the data they need, regardless of where it is stored.

Interoperability, on the other hand, refers to the ability of different systems and applications to work together seamlessly. In the context of data, interoperability means that different systems are able to exchange and make use of data in a consistent and meaningful way. This can be achieved through the use of common data formats, protocols, and standards.

42
Q

master & reference data management

A

= Ensure the uniformity, accuracy, stewardship, semantic consistency