Session 2 Flashcards

1
Q

what are the four steps of business perfromance management?

A
  1. strategies
  2. plan
  3. monitor/analyse
  4. act/adjust
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what happens in the strategize phase?

A

finding out about mission, value, goals, objectives, incentives and strategy maps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what happens in the plan phase?

A

we nee to conduct budgeting, forecasting, modelling, introducing initiatives or setting targets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what happens in the monitor/analyse phase?

A

we need to check perfromance dashboards, reports and analytical tools

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what happens in the act/adjust phase?

A

we execute the strategy, we need to interpret, collaborate, assess, decide, act adjust and track what is happeninhg tp deal with changes circumstances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is data?

A

collection of facts usually obtained as the result of experiences, observations and experiments
- may consist of numbersm, words or images
- lowest level of abstraction from which information and knowledge are derived
- data becomed information and knowledge once we analyze it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are the two types of data?

A

structured
unstructured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the data categories for structured data?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is categorial data?

A

can be put into groups and categories using data and labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is nominal data?

A

type pf categorial data: 1. nominal data: data classified without an ordering or a rank (female/ male)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is ordinal data?

A

type of categorical data: 2. ordinal data: data that can be ranked w/o measurable intervals in-between (lower middle upper class)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is numerical data? (and the two sub categories)

A

data referring to numbers (can be categorised, ranked and has equal intervals in-between)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is interval data?

A

type of numerical data:
whithout a true/ natural zero (e.g. celsius, 0 degrees is no absence of temp, and 40 degrees is not 2x times 20 degrees)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is ratio data?

A

type of numerical data:
with a true /natural zero that indicated compleete absence of quantuty (weight, income)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what three operations can we make to measure centrality?

A
  1. arthimetic mean (sum/ nomber of observations)
  2. mean (middle value of an odered dataset)
  3. mode (number that occurs most often)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what operastions can we make to measure dispersion?

A
  1. range
  2. variance
  3. standard deviation
  4. mean absolute deviation
  5. quartile and interquartile range
  6. box-wiskers plot
  7. shape and distribution (skewness, kurtosis)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is the range?

A

(diff. between highest an dlowest value)

18
Q

what is a variance?

A

measures of how much data spreasds around the average)

19
Q

what is a standard deviation?

A

(square root of variance)

20
Q

hat is the mean absolute deviation?

A

(average distance between each data point and the mean)

21
Q

what is quartile and interquartile range?

A

range of each quartile of data

22
Q

what is the boy-whiskers plot?

A

graphical representation of the data dispersion

23
Q

what is the shape and distribution (skewness, kurtosis)?

A

where does the data have more points than elsewhere?

24
Q

what is kurtosis?

A

desribes the peakiness of the pronability of distribution
the more peaky –> positive kurtosis

25
Q

what is skewness?

A

quantififes extend and direction of departure from horizontal symmetry in a data set?
–> positive is to the left
–> negative is to the right

26
Q

what is a dashboard?

A

dashbaords provide visiual displays of important information, that is consolidated and arranged on a single screen so that the information can be easily digested at one glance and easily drilled in and further explored

27
Q

what should a dashboard have?

A
  1. visual components to highlight data and exceptions that require an action
  2. transparency to the user, so that it required minimal training and ie easy to use (especially for execs)
  3. combining data from a variety of systems into a single, summarized and unified view of the business
  4. enabling drill-down (go from monthly to weekly to daily) or drill through (go deeoer into data point) for underlyinh data sources and reports
  5. preseneting a dynamic, real-world view with timely data
  6. requiring little coding to implement /deploy / maintain
28
Q

what are best practices for dashboards designs?

A
  1. benchmark KPIs with industry standards to see where the company stands
  2. wrap the metrics with contextual metadata: include which data is inlcuded and when last update was made
  3. validate the design by usability specialist
  4. prioritze the rank alerts and exceptions, but be careful, that there are only important warnings
  5. pick the right visual constructs: histograms for distribution, pie chats for market shares
  6. provide guided analytics
29
Q

what is included in text analysis?

A

information retrieval, natural language processing, text mining, web mining and data mining

30
Q

why is text difficult to analyse?

A

it is unstructured data:
- it has linguistic strcuture intended fir human consumption, not for computers
- its is relatively dirty: spelling errors, emojis, abbreviations, grammatical errors, or sarcasm
- context is important

31
Q

how can we represent data for analysis?

A

by turning it into a feature-vector form (vector with values, we get varibales from the text)

32
Q

what is a token / term ?

A

small indivudal elemnts that compose a document, may be words, sentences or paragraphs depending on our definition

33
Q

what is a document?

A

one piece of text regardless of how large or small

34
Q

what is a corpus?

A

a collection of doduments (all of wikipedia, all tweets etc.)

35
Q

what are 5 representation tequiniues for textual data?

A
  1. bag of words
  2. term frequency (TF)
  3. inverse document frequency (IDF)
  4. TFIDF
  5. N-grams
36
Q

what is the bag of words method?

A
  • treat every document as just a collection of individual words
  • ignore grammar, word order, sentence structure and punctuation
  • inexpensive and straigtfoward
  • we can check if word exists (0 or 1)
    example: used in spam filters
37
Q

what is the term frequency method?

A

examine word count in three steps:

  1. normalization: every term becomes lowercase
  2. stemming: suffixes are removed and plurals are turned into singulars (only leave stem)
  3. removal of stop words: very common words in the respective leanguage are being parsed, which do not have any useful meaning (the, and, of etc.)
38
Q

what is the inverse document frequency method?

A
  • next to TF also look at the distribution of a term over a corpus: the term should neither be too rare nor too common
  • IDF: boost a term gets for being rare as the measure increases the more rare the term is
39
Q

what is the TFIDF method?

A

we produce TF and and IDF
-> it is specific to a single doument, whereas IDF depends on an entire corpus

40
Q

what is the n-gram method?

A
  • in some cases, word order is important and we want to preserve some information about it
  • so we include sequences of adjacent words, called n-grams as terms (1.000 words, we have 1.000 ns)

example:

41
Q

what is an advantage and what is a disadvanatge of n-grams?

A

+: easy to generate, required no linguistic knowledge
-: greatly increases with the size of the feature set, so it needs some special considerations for dealing with massive numbers of features and computational storage space

42
Q

what is the sentiment analysis?

A

goal is to asnwer the question what people feel about a certain topic
(LIWC as dictionary for sentiments and words)