Data and IS Flashcards

1
Q

Data and Information

A

Data are facts, events, and transactions which have been recorded. They are
basically the raw inputs which further get processed to become information.

When facts are filtered through one or more processes (human or system), and
are ready to give certain kind of details… they are the information.

Processed data when presented in some useful and meaningful form, it is
actually the information we are looking at.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Definitions of Data,Information

A

Data
Raw facts such as an employee’s name and number of hours worked in a
week, inventory part numbers or sales orders.
Information
A collection of facts organized in such a way that they have additional
value beyond the value of the facts themselves.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Pyramid

A

Data(Raw)
Information(Meaning)
knowledge(Context)
Wisdom(Applied)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is good information?

A

Accurate – entering incorrect sales data creates false information.

Timely – knowing that production doesn’t have enough raw materials for
next week’s schedule won’t be useful information three weeks from now.

Relevant – if your boss needs to know how many shipments were late
last month, you shouldn’t give him a list of all items that shipped.

Worth its cost – is it cost worthy to map out the entire U.S. if you only
need one state?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Nature of data

A

Structured data
Unstructured/textual data
Semi-structured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

DBMS (Database Management Systems)

A

Software for creating, storing, organizing, and accessing data from a database
Separates the logical and physical views of the data
Logical view: how end users view data
Physical view: how data are actually structured and organized
Examples: Microsoft Access, D B 2, Oracle Database, Microsoft S Q L Server, MySQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Business Intelligence

A

Business Intelligence is a collection of software and tools that are designed to
understand and interpret the vast quantities of data that an organisation accumulates over time.

Business Intelligence tools use AI to process vast amounts of data and break it down into individual insights. This means it can then be analysed and potentially actioned into a business decision.

It allows companies to see patterns, trends, areas of growth and also areas of weakness and vulnerability.

Par of digital transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data Warehouses

A

Database that stores current and historical data that may be of interest to decision makers

Consolidates and standardizes data from many systems, operational and transactional databases

Data can be accessed but not altered

Data Warehouse is basically the collection of data from various heterogeneous sources.

It is the main component of the business intelligence system where analysis and management of data are done which is further used to improve decision making.

It involves the process of extraction, loading, and transformation for providing the data for analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

ETL(Extract Transform and Load)

A

ETL stands for extract, transform, load,three database functions that are
combined into one tool to pull data out of one database and place it into anotherdatabase.

1.Extraction
Collecting data from a variety of sources
Converting data into a format that can be
used in transformation processing

2.Transformation processing
Make sure data meets the data
warehouse’s needs
3.Loading
Process of transferring data to the data
warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data Warehouses Characteristics

A

Subject-Oriented:
A data warehouse can be used to analyze a particular subject area. For example, “sales” can be a particular subject.

Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.

Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where
often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.

Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse
should never be altered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data Warehouse-Multidimensionality

A

Multidimensional presentation
1.Dimensions: products, salespeople, market segments,
business units, geographical locations, distribution channels,
country, or industry
2.Measures: money, sales volume, head count, inventory profit,
actual versus forecast
3.Time: daily, weekly, monthly, quarterly, or yearly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Mart

A

Subset of data warehouses that is highly focused and isolated for a
specific population of users

Data marts are often built and controlled by a single department within an
organization.

Data mart
Smaller version of data warehouse
Used by single department or function

Advantages over data warehouses
More limited scope than data warehouses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Differences Between a Data Warehouse and a Data Mart

A

1.Scope :
corporate vs line of business
2.Subject:
Multiple vs Single subject
3.Data Sources:
Many vs Few
4.Size
100+gb vs less then 100gb
5.Implementation time:
Months to years vs Months

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Cloud Data Warehouse

A

Eliminates the need to purchase any in-house hardware for data warehousing.
• Offer lower upfront costs as compared to traditional warehouses.
• It offers higher scalability with an increase in available data.
• Frees up capacity on in-house systems
• Frees up cash flow
• Makes powerful solutions affordable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Additional DW Considerations
Cloud Data Warehouses

A

Usability:
Moving a a document into the cloud storage folder permanently move
document from its original folder to the cloud storage location.
Bandwidth:
Several cloud storage services have a specific bandwidth allowance.
Accessibility:
If you have no internet connection, you have no access to your data.
Data Security:
There are concerns with the safety and privacy of important data stored
remotely. The possibility of private data commingling with other organizations makes
some businesses uneasy.
Software: If you want to be able to manipulate your files locally through multiple
devices, you’ll need to download the service on all devices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Data Lake

A

The data lake is a powerful data architecture that leverages the economics of big data.
The data lake has the potential to transform the business by providing a singular repository of all the
organization’s data (structured AND unstructured data; internal AND external data) that enables
business analysts and data science team to mine all of organizational data that today is scattered
across a multitude of operational systems, data warehouses, data marts and “spreadmarts”

17
Q

Hadoop

A

Open-source software framework for Big Data

Breaks data task into sub-problems and distributes the processing to many inexpensive computer processing nodes

Combines result into smaller data set that is easier to analyze

MapReduce is a programming model or pattern within the Hadoop framework that is used to access big data stored in the Hadoop File System (HDFS).

MapReduce facilitates concurrent processing by splitting petabytes of data into smaller chunks, and processing them in parallel on Hadoop commodity servers