week 2 Nature and forms of data Flashcards
why is statistics relevant in business
Statistics plays an important role in virtually all aspects of business (e.g. strategy, marketing, operations, supply chain).
what are some common applications
of statistics in business
Common applications of statistics include predictive modelling, pattern recognition, anomaly detection, classification, and sentiment analysis.
Data analysis cycle. statistical enquiry cycle.
problem: define the problem. question and hypothesis
plan: study design and variables
data: collect and treat dataset
analysis: exploratory data analysis (EDA)
Modelling effort
Relating findings with context
answer the question
present results and insights
new questions may emerge
Data science process
The data analysis process includes a set of activities that business analysists/ data scientists perform to gather, prepare, analyse data, and present the results/ findings to business users
What are the two main categories in which data collection is typically distinguished, and why is data collected.
data is collected for specific purposes
In terms of data collection, it may be distinguished between primary and secondary.
what is primary data
Primary data refers to data collected directly from the data source without going through any existing sources (e.g. survey conducted by a researcher, answers of an online questionnaire).
what is secondary data
Secondary data consists of data previously collected and compiled by someone else (e.g. stock market index).
Data vs information
raw facts or figures
Meaningless and useless until it is organised and processed
understanding is commonly difficult
input is treated as data
data with context
processed and meaningful form of data
understanding is comparably easier
output is treated as information
qualitative data
Qualitative data are names or labels used to identify an attribute of each element.It may be numeric or nonnumeric (use the nominal or ordinal scale).
quantitative data
Quantitative data represent measurements or counts.It is always numeric (use the interval or ratio scale).
what does the level of measurement determine
The level of measurement determines the amount of information contained in the data.
what does the level of measurement also indicate
The level of measurement also indicates the data summarisation and statistical analyses that are most appropriate.
what are the four levels of measurement
There are four levels of measurement: nominal, ordinal, interval, and ratio.
what does nominal data consist of
Nominal data consists of labels or names used for identification, may be non-numeric or numeric.
information about nominal data
The categories are in no logical order and have no particular relationship. The categories are said to be mutually exclusive since an individual, object, or measurement can be included in only one category.
what does ordinal data consist of
Ordinal data exhibits properties of nominal data and may be rank-ordered.
what does interval data consist of
Interval data have the properties of ordinal data but also show uniform distances between successive values.
what does ratio data consist of
Ratio data have all the properties of interval data and the ratio of two values is meaningful.Scale must have a natural zero point (i.e. there is a nonarbitrary zero point).
nominal, ordinal, interval and ratio data
nominal: variable is only named
ordinal: variable is named and ordered
interval: variable is named, ordered, and proportionate interval
ratio: variable is named, ordered, proportionate interval, and considers absolute zero
Big Data
Big data refers to the large and diverse sets of information that grow at ever-increasing rates.
Three V’s of Big Data: The volume of information, velocity (or speed) at which data are created and collected, and the variety of data available.
Big data often comes from data mining and arrives in multiple formats.