Visual Analytics Flashcards

1
Q

What is Business Intelligence?

A

Encompasses the technologies and processes used for gathering storing and accessing and analyzing data for decision making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Visual Analytics?

A

A BI method and it is the process of combining interactive visualizations with analytical reasoning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is ETL?

A

Extract, Transform and Load. And it is a process for organizing, cleaning and combining data from multiple sources to e.g. a data warehouse or data lake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Data?

A

Data are facts that are recorded and can be accessed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Information ?

A

Refers to the data that is accessed by a user for some particular purpose.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Database Management System?

A

Database management system (DBS) software is used for:
- Creation of databases.
- Insertion, storage, retrieval, update, and deletion of the data in the database.
- Maintenance of databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Data Visualization Software?

A

Data visualization software is software specifically designed for:

  • Creating visualizations and dashboards
  • Exploratory data analysis
  • Reporting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a database?

A

A database is a structured collection of related data stored on a computer medium. Organizes the data in a way that facilitates efficient access to the information captured in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a database system ?

A

A database system is a computer-based system whose purpose is to enable an efficient interaction between the users and the information captured in a database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Metadata?

A

Metadata is the data that describes the structure and the properties of the data. Metadata is essential for the proper understanding and use of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is database metadata?

A

Database metadata represents the structures of the database. Database content that is not the data itself (data about data). It contains:

  • Names of data structures
  • Data types
  • Data descriptions
  • Other information describing the characteristics of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between transactional and analytic databases?

A

Operational information (Transaction information) is the information collected and used in support of day-to-day operational needs in businesses and other organizations.

Operational database collects and presents operational information in support of daily operational procedures and processes.

Analytical information is the information collected and used in support of analytical tasks. Analytical information is based on operational (transactional) information.

Analytical database collects and presents analytical information in support of analytical tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is ER Modeling?

A

ER modeling: a conceptual database modeling technique

  • Enables the structuring and organizing of the requirements collection process
  • Provides a way to graphically represent the requirements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an Entity

A

Entities : constructs that represent what the database keeps track of. Their are the building block of diagram. Within each ERD each entity needs to have a different name. E.g., customer, store.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an Attribute?

A

Attributes: the depiction of a characteristic of an entity

  • Represents the details that will be recorded for each entity
    instance
  • Within one entity, each attribute must have a different name
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are cardinality constraints?

A

Cardinality constraints depict how many instances of one entity can be associated with instances of another entity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the different maximum cardinality? How do we represent them?

A
  • One: represented by a straight bar: |
  • Many: represented by a crow´s foot symbol:
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the different minimum cardinality? How do we represent them?

A
  • Optional: represented by a circular symbol: 0
  • Mandatory: Represented by a straight bar: |
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a relational database model?

A

Relational database model is logical database model that represents a database as a collection of related tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a relational schema?

A

Relational schema is visual depiction of the relational database model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a relation?

A

Relation is a table in a relational database. In a order for a table to be a relation to the following conditions must hold:

  • Within one table, each column must have a unique name.
  • Within one table, each row must be unique.
  • All values in each column must be from the same (predefined) domain.
  • Within each row, each value in each column must be single valued (one value from a predefined domain, within each row in each column)

The order of the rows and columns is irrelevant.

22
Q

What is a Primary Key and composite key?

A

Primary key is a column whose value is unique for each row.

Composite primary key is a primary key that is composed of multiple columns.

23
Q

What is the Entity Integrity Constraint?

A

Entity integrity constraint means that in a relational table, no primary key column can have null (empty) values. AKA no primary key can be empty.

24
Q

What is the two focus in SQL in this class?

A

Data Definition Language;
Data Manipulation Language.

25
Q

What is an SQL Injection?

A

This threat primarily exists with web interfaces or in other words when a webpage input communicates with a database. Besides websites, even within an application connected to a database used in an organization this threat exists. SQL Injections are dangerous because they can be used to view or edit data (or the whole database) that the attacker should not be able to.

26
Q

What is indexing, what are the advantages/disadvantages and when do we use it?

A

Indexes can be added to columns within tables of databses. They make queries on this column significantly faster when using the SELECT or WHERE clause

Why don’t we always use indexing?

  • Indexes make updating (adding rows) to the table slower
  • Indexing takes disc space

Thus indexes are best in columns that are constantly queried but where the table is not frequently updated.

27
Q

What is a scatter plot used for?

A

Scatter plot: ideal for clusters of data and useful to inspect bivariate data or pairs of variables.

28
Q

What is histogram used for?

A

Histograms: good to help understand how the data is distributed.

29
Q

What is a line chart used for?

A

Line charts: good to visualize time series data.

30
Q

What is a bar chart used for?

A

Bar charts: Useful to present categorical data where one axis is the category and the other is the measured value.

31
Q

What is a box plot used for?

A

BoxPlots: Useful for comparing distributions in categorical data.

32
Q

What is a Heatmap used for ?

A

Heatmaps: Easy way to see relationships between two variables and their different values.

33
Q

What is imputing values? What’re the advantages and disadvantages?

A

Imputing Values: Imputing means replacing data with substituted values. Whether we should remove or impute data depends on the situation, removing missing values is in general less risky than imputing. But removing missing values could make the dataset too small or imbalanced. It is good to consider removing variables with lots of missing values rather than rows.

34
Q

What are the types of imputation

A

Univariate, bivariate, nearest neighbor

35
Q

What is data cleaning?

A

Data cleaning: taking the data, and renaming columns/format data correctly (date for example), handling missing values.

36
Q

What is data transformation?

A

Data Transformation: adding or modifying variables such as applying log-transformations.

37
Q

What is Data Manipulation?

A

Data manipulation: a broad area with a broad area of transformations, such as aggregation and filtering.

38
Q

What is data modeling?

A

Data Modeling: finding relation between tables, aka trying to model them togethter (data engineer) and applying mathematical model (such as linear regression) (data scientist).

39
Q

What is alteryx?

A

Alteryx is a low code no code data blending and transformation tool. It allows for less technical users to do data prep, bendling and advanced analytics.

40
Q

What are the different types of files in Alteryx?

A
  • Alteryx workflow
  • Alteryx database
  • Alteryx packaged workflow
41
Q

What is selective attention?

A

There is a lot going on around us, and a portion of what our eyes see becomes and object of focus. Moreover, a fraction of that becomes the focus of our attention, and a portion of that is processed as a conscious thought, and even less of that gets stored in memory.

42
Q

What is a chart and graph?

A

A chart and a graph often refer to the same, but we define chart as any visual representation of data and graphs as a subcategory of charts.

43
Q

What is a dashboard?

A

Dashboard: A dashboard is a visual display of the most important information needed to achieve one or more objectives that has been consolidated on a single computer screen so it can be monitored at a glance.

44
Q

Why is data an asset?

A

Data is widely considered an asset in organizations because it is necessary for business operations, and can provide insights into customers, products and services.

45
Q

What is data management?

A

The development, execution, and supervision of policies, programs and practices that deliver, control, protect and enhance the value of data and information assets through their life cycle.

46
Q

What is data governance?

A

Data governance aims to help control data development, reduce the risk associated with data user and enable organisation to leverage data strategically.

47
Q

What are examples of preattentive attributes?

A

Size, color, shape, position.

48
Q

What are the steps for dashboard design?

A
  1. Understand the context.
  2. Choose appropriate graphs.
  3. Sketch it.
  4. Design.
  5. Feedback and iterate.
49
Q

What is a data pipeline, and what are the steps?

A

A data pipeline is a set of processes that move data from one place to another and typically includes the following steps:
1. Extract: collect data from multiple sources
2. Transform: Clean, normalize and transform data.
3. Load: Store the transformed data.
4. Monitor: Keep track of the pipeline’s performance and monitor for errors.

50
Q

What is a data mesh?

A

It is a way of structuring and governing data access and usage in large, complex organizations. It is an architecture pattern that aims to move away from centralized, monolithic data systems, such as a single data lake or data warehouse, towards a decentralized, product-centric approach. Meaning organized by teams.

51
Q

What is data lake and data warehouse?

A

Data lake : It is an unstructured way of storing data, it can be store structured, unstructured and semi structured data.

Data warehouse : Used to store structured and organized data.

52
Q

What is data architecture?

A

Data architecture is the process of designing, creating, deploying and managing the data structures and systems that support an organization’s business goals and objectives. It encompasses the design and organization of data, including the data model, data storage, and data flow, as well as the management and maintenance of data.