Module 2 stars Flashcards

1
Q

What is data wrangling?*

A

“+The process of retrieving, cleansing, integrating, transforming and enriching data to support subsequent analysis”

+IT transforms raw data into a format that is more appropriate and easier to analyse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the objectives of data wrangling?*

A

+Improve data quality

+Reduce time/effort required to perform analysis

+Reveal the true intelligence in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is data modelling?*

A

“+The process of defining The structure of a database”

+Relational databases are modelled in a way to offer flexibility and ease of data retrieval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an ERD?*

A

“+Entity relationship diagrams are graphical representations used to illustrate The structure of The data”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an entity?*

A

“+Person, place, things, events, etc..”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an instance?*

A

“+a single occurrence of An Entity”

+Represented as a record in a database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 2 types of keys in an ERD?*

A

1.Primary key

2.Foreign key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a primary and composite primary key?*

A

1.Primary key:+Attribute that uniquely identifies each instance of the entity
+Used for fast retrieval and searches

2.Compisite primary key:+A primary key that consists of more than one attribute
+Used when none of the individual attributes alone can uniquely identify each instance of the entity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a foreign key?*

A

“+a primary key from another Entity, that The focal Entity contains.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 3 main aspects of an attribute?*

A

1.Name:+A unique name for the attribute

2.Values/meanings:+List of acceptable values for the attribute and what these values mean

3.Description:+The definition of the attribute in relation to the solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Further discuss relationships of entities in an ERD?*

A

“+The relationships between entities provide structure for The data model.”

+Indicating which entities relate to other entities and how

+Specifications show The number of minimum and maximum occurrences allowed on Each side of The relationship

+relationships may be read in either direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the different relationship indicators for an ERD?*

A

+Read the slides for the 6 indicators(Slide 15 chap 2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a (enterprise) Data warehouse?*

A

“+a central repository of data from multiple departments within An organisation”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are 5 reasons for a enterprise data warehouse system?*

A

+It is integrated and accurate

+Supports managerial decision making

+It helps keep enterprise wide organisation (clean/ organised) around subjects such as sales, customers or products

+Gives a historical and comprehensive view of the entire organisation.

+NoteVolume of data can become very large very quickly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is data integrated from different databases in different departments?

A

“+Using The ETL ( Extraction, transformation and load) process”

+data must be universally retrieved , reconciled and transformed into a consistent format

+After this The final data must be loaded into a data warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Discuss what a data mart is.*

A

“+a small scale warehouse”

+a subset of The data warehouse

+Focuses on one particular subject or decision area

+Conforms to a multidimensional data model called a star scheme

17
Q

What 6 purposes does an ERD serve?*

A

1.Database design:+ERD’s assist in outlining the logical structure and business rules of a database during the creation of a logical model
+Aids in the technological implantation of the database in a physical data model

2.Database troubleshooting:+ERD’s help resolve and diagnose issues/problems arising from the logic or deployment of existing databases
+Gives a visual aid to identify database structure weak points.

**3. Business information systems: **+Business processes benefit from the clarity provided by ERD’s
+this results in streamlined processes, easier access to information and improved business operation results.

**4. Business process re-engineering: **+ERD’s greatly aid in rethinking/altering of current business processes.
+Helps in analysing current databases.
+Assists in modelling the new databases being designed.

**5. Education: **+ERD’s help plan data structures for databases that store relational information for educational purposes

**6. Research: **+Most research today is centred around structured data
+ERD’s help set up databases using this data to be analysed/researched later.
+ERD’s promote more robust and insightful research outcomes.

18
Q

What is data transformation?*

A

“+a data conversion process from one format or structure to another”

19
Q

Why is data transformation used?*

A

“+to meet The requirements of statistical and data mining techniques used for The analysis”

+Take note. Sometimes IT makes sense to group a vast range of numerical values into a small number of bins

20
Q

What are some things to note when entering categorical data into analytical models?*

A

“+categorical data must Sometimes be converted into numerical values”

21
Q

How can categorical data be converted for analytical models?*

A

+making use of dummy numbers

+Also through category scores(Think rating something from 1-5 stars. Each number of start will then represent the categorical score. EG. 3 stars will be entered as a score of 3 into the model.