Session 2 Flashcards
what are the four steps of business perfromance management?
- strategies
- plan
- monitor/analyse
- act/adjust
what happens in the strategize phase?
finding out about mission, value, goals, objectives, incentives and strategy maps
what happens in the plan phase?
we nee to conduct budgeting, forecasting, modelling, introducing initiatives or setting targets
what happens in the monitor/analyse phase?
we need to check perfromance dashboards, reports and analytical tools
what happens in the act/adjust phase?
we execute the strategy, we need to interpret, collaborate, assess, decide, act adjust and track what is happeninhg tp deal with changes circumstances
What is data?
collection of facts usually obtained as the result of experiences, observations and experiments
- may consist of numbersm, words or images
- lowest level of abstraction from which information and knowledge are derived
- data becomed information and knowledge once we analyze it
what are the two types of data?
structured
unstructured
what are the data categories for structured data?
what is categorial data?
can be put into groups and categories using data and labels
what is nominal data?
type pf categorial data: 1. nominal data: data classified without an ordering or a rank (female/ male)
what is ordinal data?
type of categorical data: 2. ordinal data: data that can be ranked w/o measurable intervals in-between (lower middle upper class)
what is numerical data? (and the two sub categories)
data referring to numbers (can be categorised, ranked and has equal intervals in-between)
what is interval data?
type of numerical data:
whithout a true/ natural zero (e.g. celsius, 0 degrees is no absence of temp, and 40 degrees is not 2x times 20 degrees)
what is ratio data?
type of numerical data:
with a true /natural zero that indicated compleete absence of quantuty (weight, income)
what three operations can we make to measure centrality?
- arthimetic mean (sum/ nomber of observations)
- mean (middle value of an odered dataset)
- mode (number that occurs most often)
what operastions can we make to measure dispersion?
- range
- variance
- standard deviation
- mean absolute deviation
- quartile and interquartile range
- box-wiskers plot
- shape and distribution (skewness, kurtosis)
what is the range?
(diff. between highest an dlowest value)
what is a variance?
measures of how much data spreasds around the average)
what is a standard deviation?
(square root of variance)
hat is the mean absolute deviation?
(average distance between each data point and the mean)
what is quartile and interquartile range?
range of each quartile of data
what is the boy-whiskers plot?
graphical representation of the data dispersion
what is the shape and distribution (skewness, kurtosis)?
where does the data have more points than elsewhere?
what is kurtosis?
desribes the peakiness of the pronability of distribution
the more peaky –> positive kurtosis
what is skewness?
quantififes extend and direction of departure from horizontal symmetry in a data set?
–> positive is to the left
–> negative is to the right
what is a dashboard?
dashbaords provide visiual displays of important information, that is consolidated and arranged on a single screen so that the information can be easily digested at one glance and easily drilled in and further explored
what should a dashboard have?
- visual components to highlight data and exceptions that require an action
- transparency to the user, so that it required minimal training and ie easy to use (especially for execs)
- combining data from a variety of systems into a single, summarized and unified view of the business
- enabling drill-down (go from monthly to weekly to daily) or drill through (go deeoer into data point) for underlyinh data sources and reports
- preseneting a dynamic, real-world view with timely data
- requiring little coding to implement /deploy / maintain
what are best practices for dashboards designs?
- benchmark KPIs with industry standards to see where the company stands
- wrap the metrics with contextual metadata: include which data is inlcuded and when last update was made
- validate the design by usability specialist
- prioritze the rank alerts and exceptions, but be careful, that there are only important warnings
- pick the right visual constructs: histograms for distribution, pie chats for market shares
- provide guided analytics
what is included in text analysis?
information retrieval, natural language processing, text mining, web mining and data mining
why is text difficult to analyse?
it is unstructured data:
- it has linguistic strcuture intended fir human consumption, not for computers
- its is relatively dirty: spelling errors, emojis, abbreviations, grammatical errors, or sarcasm
- context is important
how can we represent data for analysis?
by turning it into a feature-vector form (vector with values, we get varibales from the text)
what is a token / term ?
small indivudal elemnts that compose a document, may be words, sentences or paragraphs depending on our definition
what is a document?
one piece of text regardless of how large or small
what is a corpus?
a collection of doduments (all of wikipedia, all tweets etc.)
what are 5 representation tequiniues for textual data?
- bag of words
- term frequency (TF)
- inverse document frequency (IDF)
- TFIDF
- N-grams
what is the bag of words method?
- treat every document as just a collection of individual words
- ignore grammar, word order, sentence structure and punctuation
- inexpensive and straigtfoward
- we can check if word exists (0 or 1)
example: used in spam filters
what is the term frequency method?
examine word count in three steps:
- normalization: every term becomes lowercase
- stemming: suffixes are removed and plurals are turned into singulars (only leave stem)
- removal of stop words: very common words in the respective leanguage are being parsed, which do not have any useful meaning (the, and, of etc.)
what is the inverse document frequency method?
- next to TF also look at the distribution of a term over a corpus: the term should neither be too rare nor too common
- IDF: boost a term gets for being rare as the measure increases the more rare the term is
what is the TFIDF method?
we produce TF and and IDF
-> it is specific to a single doument, whereas IDF depends on an entire corpus
what is the n-gram method?
- in some cases, word order is important and we want to preserve some information about it
- so we include sequences of adjacent words, called n-grams as terms (1.000 words, we have 1.000 ns)
example:
what is an advantage and what is a disadvanatge of n-grams?
+: easy to generate, required no linguistic knowledge
-: greatly increases with the size of the feature set, so it needs some special considerations for dealing with massive numbers of features and computational storage space
what is the sentiment analysis?
goal is to asnwer the question what people feel about a certain topic
(LIWC as dictionary for sentiments and words)