Big BomBastic Brain Flashcards

1
Q

Hvad er BI?

A

It værktøjer, der hjælper med at tage datadrevet beslutninger:

”Timely, accurate, high-value, and actionable business insights and the work processes and techologies used to obtain them” s.12

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The big 4

A

Accurate, valuable, timely and actionable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The BI proposition (supply chain)

A

War Opertional Data → Experts, tools, best practices → Indsights → Conclusions → acitons → Operational results → bedre næste gang! (wisdom)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hvor kommer BI fra?

A

ERP systemer ( SAP var en af de første)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hvornår fejler BI?

A
  • Når der ikke er nok planlægning; fokus på scope og forventninger
  • Når der ikke bliver taget aktion på information (støj mellem information og handling)
  • Når det indføres i en ikke moden kultur
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Hvad er væriden i BI?

A

At kunne svare på de spørgsmål, det var designet til

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Hvad betyder querying

A

To pull information from a database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

BI leverance trekanten (you may pick any two)

A

“Fast, cheap reliable: You may pick any two” s. 56

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Hvad er Drill down?

A

muligøre en langt større precision i at finde de informationer man efterspørger - som ofte er en af problemerne, at ‘kunne få den rigtige information til de rigtige mennesker’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hvad er ETL, og hvad gør den?

A

Extract, transform, load
Get data together, in a singel format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What can OLAP do?

A

Online Analytical Processing
- Consolidation: gøre data mere abstrakt. Modsat af drill through
- Drilling into: Fordybelse i data
- Computation: Mulighed for udregning
- Pivoting: Mulighed for at se sin data fra forksellige perspektiver

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

OLTP

A

Online transaction processing

Relationel databases fordele
fx. i Transactional systems

  • Rapidly hurtig
  • I store datamængder
  • In real time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Attribute
Cell
Measure

A

Attribute: A descriptive detail. En katogorisering.

Cell: Et sted hvor der er indholdt en værdi fx. B2 indeholder 5439$

Measure: Næsten en fact, udregnings værdi ie. salg, pris

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Dashboard typer (ligsom org.)

A

Tactical,- Operational,- og Strategic Dashboards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Dashbord layout (Aesthetic appeal and interacitivity)

A
  • Navigble
  • Appealing and diffring graphics
  • Interactivity
  • Customizable interface
  • Embedded content
  • Browser-based capabilities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Reserch katagorier:
- Typer af BI rapporter (opgaver)

A
  • Visualization: Gør critical business information mere klart og meningsfuldt
    “What the eye sees, the mind knows” s.102
  • Guided analysis: Interaktiv, goal-orinteret BI analyse
  • Data mining: At finde nålen i høstakken
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Ad hoc query

A

—People use SQL to make ad hoc queries to a database when the need arises.
This is the opposite of predefined queries, which are performed routinely and known ahead
of time. Tools for ad hoc querying can help you manipulate data for analysis and report
creation. Most business people, however, do not really need ad hoc querying; they do fine with
interactive reporting and data discovery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Hvad er Dashboards

A

This BI tool displays numeric and graphical informations on a single display, making it easy for a business person to get information from different sources and customize the appearance. This is often a mashup of other BI styles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Data mart

A

A subset of a data warehouse that’s usually oriented to a business group or
process rather than enterprise-wide views. They have value as part of the overall enterprise data
architecture, but can cause problems when they sprout uncontrolled as data silos with their own
data definitions, creating data shadow systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Data quality (5 C’s)

A

—Achieved when data embodies the “five Cs”: clean, consistent, conformed, current, and comprehensive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Data mining

A

This process analyzes large quantities of data to find patterns such as groups
of records, unusual records, and dependencies. Data mining helps businesses sift through data
to find patterns and relationships they do not yet know, such as “what is the likelihood that a
customer who buys our hammer will also buy our nails?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Data profiling

A

An essential part of the data quality process; this involves examining
source system data for anomalies in values, ranges, frequency, relationships, and other
characteristics that could hobble future efforts to analyze it. It enables early detection of
problems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

MDM (Master data management)

A

—The set of processes used to create and maintain a
consistent view, also referred to as a master list, of key enterprise reference data. This data includes such entities as customers, prospects, suppliers, employees, products, services, assets, and accounts. It also includes the groupings and hierarchies associated with these
entities.
— Sørger for der er en aurtoriativ (den vigtige) (1) sandhed - processer for at man kan validere. Opretholder en standartiseret sandhed gennem hele virksomehden - ved process håndtering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Self service BI

A

Intuitive tools that allow BI consumers to obtain the information they need without the help of the IT group. People still need the IT group for the hard work of making the data clean, correct, consistent, current, and comprehensive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Data star schema

A

er en måde at udvikel databaser på. Det er hvor der er en eller flere fact tabeller der udligere information til diminsionerne. Der kan være flere fact tabeller konnektet til den samme diminsion fx hvis diminsionen er dato(tid). Herfra er det manuel grundkode der sørger for at daten, kan indsættes i de forskellige dimensioner.

26
Q

Levels of Data

A
  • Business view / coneptual data model (top)
  • Architect view / Logical data (mid)
  • Developer view / physical (bottom, DW/mart)
27
Q

Forklar:
- Entities
- Attributes
- Kardinalitet
- Granualitet

A

Entities: Informations sted ie. salesorder = x

Attributes: are basically properties of entity ie. entitie beskriver dybde, vil attributen hede Dept.

Kardinalitet Hvordan attributter er sat sammen. ie. mange-til-mage, 1-til-1 osv.

Granualitet: level of detail

28
Q

Nøgler:
- primær
- Alternate
- Surrogate
- Foreign
- Candidate

A

Primary: nøgle der formegentlig er unik på tværs af tabellen; kun 1. Det gør man kan kalde resten i kolonnen i den tabel

Alt key: aldrig brugt. Når der er 2 der kan bruges som primaries. Vælges ie. cpr, eller au id.

surrogate: Ofte int (tal), refere til et sted og kun et sted. Giver den andet mnummer så det mere overskuligt

Foraign key viser hvor man skal kigge - nøgle der ligger i en anden tabel

Candidate key: kandidat til at være primær. Bruges i opbygning

29
Q

Hvad gør, Normalization (normalform, 3NF), og pros/cons

A

reduce redundancy data

  • Pros: gir rent data; nemmere at definere, overskulige afhænigheder
  • Cons: manglende overskulighed og effektivitet (især ved udtræk)
30
Q

Forældre (Relationship)

A

Bruger en foraign key til at henvise hvor dataen er gemt, for at opbygge et hierki.
Ligsom i MitHR kan man hvem der høre under hvem i et hiraki

31
Q

Event tabel

A

ja/nej eller 0/1 sortering. Boolean

32
Q

Aggrevating

A

Aggregate data is high-level data which is acquired by combining individual-level data. For instance, the output of an industry is an aggregate of the firms’ individual outputs within that industry.

33
Q

Variable-depth hierchies

A

Ragged og unbalanced
Ragged hierarchies: Når forksellige drill throughs ikke indeholder samme metadata katogorisering, så der er et hiraki med ‘manglene’ sub - i disse tilfælde brug NULL

Unbalanced: has at least one branch which does not reach down to the lowest level.

34
Q

Hvilke elemnter hjælper en foranalyse med at afvikle?

A

• Holistic—avoid costly overlaps and inconsistencies.
• Incremental—more manageable and practical.
• Iterative—discover and learn from each individual project.
• Reusable—ensure consistency.
• Documented—identify data for reuse, and create leverage for future projects.
• Auditable—necessary for government regulations and industry standards.

Ved integrations stadiet er det vigtigt at holde øje for hvad den kommer til at blive brugt til; At have en foranalyse.

Det er vigtigt og have en plan. Ofte overskrider man den plan - Hvorfor? Det er nok for alle it løgsninger skal være skradersyet. Der er ikke mange standarter, og ofte vil man hellere betale lav etablering, som kommer til at koste i det lange løb (drfit, opretholdele). s.282

35
Q

Hvad er kardinalitet

A
  • 1-1 Betyder at i noget kun kan være i begge tabel 1 gang. Der kan kun være 1 Land
  • 1-x En til mange: Der kan være et land, men mange forskllige mennesker i det land
  • x-x Mange til mange:
36
Q

Natrual key

A

Entydigt hentyer til et produkt. Det betyder den kan bruges som surogate_key

37
Q

Atomisk granularity

A
  • Ned på en række, så langt ned man kan slice
  • Hvis noget slettes er det den nuværene. Hvad er det en række repræsentrere
38
Q

Bus Matrix

A

Matrix over dim og facts der er afhænige, eller når man kan slice gennem dem.

39
Q

Story telling

A

At guide læseren, mod den retning der ønskes. At sørge for der de visuele elementer matcher ønsket.

40
Q

Hvad er Profilering

A

Dataprofilering er processen med at undersøge de tilgængelige data fra en eksisterende informationskilde og indsamle statistikker eller informative resume om disse data. Formålet med disse statistikker kan være at: Find ud af, om eksisterende data let kan bruges til andre formål

41
Q

What are relational databases?

A

They are databases that have data stored in tables and any new information is automatically added into the table without the need to reorganize the table itself

A table can have multiple parents

42
Q

How can businesses get the most out of their data?

A

Unlock data through accurate storytelling

43
Q

What is Data Analysis? How does this work?

A
  • Processing - Data analysis is the process of identifying, cleaning, transforming, and modelling data to discover meaningful and useful information.
  • Selling Story - The data is then crafted into a story through reports for analysis to support the critical decision-making process.
44
Q

Diagnostic analytics? What is the process?

A
  • Diagnostic analytics answer questions about why events happened
  • Diagnostic techniques supplement findings from descriptive statistics to uncover the cause of events (e.g. why these events became better or worse)
    (1) Identify anomaly
    (2) Collect data related to anomaly
    (3) Use statistical techniques to discover relationships in these patterns
45
Q

Predictive analytics?

A
  • Predictive analytics techniques use historical data to identify trends and determine if they are likely to occur again in the future
  • Usually one outcome
  • Includes statistical and machine learning techniques
46
Q

Prescriptive analytics?

A
  • Prescriptive analytics help answer questions about which actions should be taken to achieve a goal or target.
  • Analyses past data to estimate the likelihood of different outcomes (multiple outcomes)
  • Uses machine learning techniques
47
Q

Visualizations? What is the goal of a visualisation?

A
  • A visualization (sometimes also referred to as a visual) is a visual representation of data, like a chart, a color-coded map, or other interesting things you can create to represent your data visually.
  • Ultimate goal - to present data in a way that provides context and insights, both of which would probably be difficult to discern from a raw table of numbers or text.
48
Q

Benefits of a good data model?

A

Data exploration is faster
Aggregations are simpler to build

Power BI Reports
Reports are more accurate
Writing reports takes less time
Reports are easier to maintain in the future

49
Q

What are the differences between fact and dimension tables?

A

Fact table
- Observational/event data
- Contains measures and numbers
- Distinct values in multiple rows

Dimension table
- Contains details about the fact table
- Unique values appear in one row

50
Q

What are hierarchies?

A

Natural segments in data that are capable of being decomposed
Systemic layers such as parent-child relationships or tree structures

51
Q

What is flattening the parent-child hierarchy?

A

He process of viewing multiple child levels based on a top-level parent is known as flattening the hierarchy.
These uses multiple columns to indicate multiple levels
Flatten the hierarchy so you can see multiple individual levels
In this process, you are creating multiple columns in a table to show the hierarchical path of the parent to the child in the same record.

52
Q

What is a role-playing dimension?

A

Role-playing dimensions have multiple valid relationships with fact tables, meaning that the same dimension can be used to filter multiple columns or tables of data.

53
Q

Why are role-playing dimensions important to understand?

A

Role playing: A table with multiple valid relationships between itself and another table.
As a result, you can filter data differently depending on what information you need to retrieve

54
Q

What is cardinality best practice?

A

Avoid one-to-one: Is not recommended because this relationship stores redundant information and suggests that the model is not designed correctly. It is better practice to combine the tables.
Avoid many-to-many: a lack of unique values introduces ambiguity and your users might not know which column of values is referring to what.

55
Q

What is best practice for relationships and cardinality?

A

A word of caution regarding bi-directional cross-filtering: You should not enable bi-directional cross-filtering relationships unless you fully understand the ramifications of doing so. Enabling it can lead to ambiguity, over-sampling, unexpected results, and potential performance degradation.

Arrows should point to fact tables

many-to-many relationships and/or bi-directional relationships are complicated. Unless you are certain what your data looks like when aggregated, these types of open-ended relationships with multiple filtering directions can introduce multiple paths through the data.

56
Q

The cons of the shadow system

A
  • Inconsistent data across the enterprise
  • Lost productivity due to “analyst time sink”
  • Lost productivity due to reconciliation
  • Data error #1 (import)
  • Data error #2 (calculations)
  • Data error #3 (data sources change)
  • Data error #4 (stale data)
  • Limited (or no) scalability
  • Increased risk
  • Lack of discipline
  • No audit trail
  • No documentation
57
Q

The pros of the shadow system

A
  • Business knowledge: give business people a short-term fix by giving them at least some of the data that they need to make a more informed decision.
  • Responsive
  • Fast and flexible
  • Fills in IT gaps
  • Fills in tool gaps
  • Accessible and inexpensive
  • Familiar: business users want tools that they know and understand
  • Effective: business requirements may have been lost in translation or new business requirements may have missed the window to be incorporated at all
58
Q

Kimballs four-step Dimensionel Design Process

A
  1. Choose the business process
  2. Declare the grain
  3. Identify the Dimensions
  4. Identify the facts
59
Q

ETL Subsystems (Kimball) 5.

A

Data Profiling
Data Extraction
Data Cleaning / Conforming
Data Transformation and Loading
ETL System Mainteanance

60
Q

Project timeline (Kimbell)

A

1st kvartil, poc, 2en kvatil færdig produkt, 3. kvatil, debugging, 4.kvartil dokumentaiton

61
Q

Separation of Concern

A

a software architecture design pattern/principle for separating an application into distinct sections, so each section addresses a separate concern