Intro Flashcards
Bigdata- statistics
3V?
3V -volume, velocity, variety
Volume - the amount of available data
Velocity- the speed of collecting and processing data
Variety -different data types.
What can big data do?
Hold great promises for understanding
Commonality: in the presence of large variations. (Noises)
Heterogeneity: personalized medicine or services.
Applications in business and economics
Accounting会计
Finance金融
Economics经济
Marketing(营销)
Operation(运营)
Information systems信息系统
What is average?
What is average for?
Average is mean median, mode
Average is for to measure the central tendency
Mean, mode, median?
Arithmetic mean,
Mode is most frequent value in the data set
Median is middle value that separates the higher half from the lower half
Data is raw facts and figures
Categorical(qualitative)- nominal, ordinal scale
Quantitative- interval, ratio scale
Simpson’s paradox
There is trend in several different sets of data, but when these data sets are combined, this trend disappears or reverses.
Confounding variables: because school is a hidden variable that cannot be ignored.
累积分布的类似(cumulative distribution )
Cumulative frequency(累积频数)
Cumulative relative frequency(累积相对频数)
Cumulative percent frequency(累积百分频数)
累积分布是?
Cumulative frequency: buleg buriiin deed hyzgaartai tentsuu buyu tuunees baga zuilsiin toog haruuln
Cumulative relative frequency: buleg buriin deed hyzgaartai tentsuu buyu tuunees baga zuilsiin haritsangui davtamjiig haruuln
Cumulative percent frequency:buleg buriin deed hyzgaartai tentsuu buyu tuunees baga zuilsiin percent g haruuln
累积分布等于:
Cumulative frequency:hurimtlagdsan davtamjiin tarhaltiin suulchiin utga ni ajiglaltuudiin niit tootoi urgelj tentsuu
Cumulative relative frequency : hurimtlagdsan haritsangui davtamjiin tarhaltiin suulchiin utga ni urgelj 1.00tei tentsuu bn
Cumulative percent frequency: hurimtlagdsan percent davtamjiin tarhaltiin suulchiin utga ni urgelj 100tai tentsuu bn
画stem and leaf display(茎叶图)
P.28
Durslel deerh mur buriig ish gej nerlene
Ish bur deerh tsifruud ni navch
Navchnii negj?
Navchnii hed ch bj bln
Navchnii negjiig haruulaagu tohioldold 1tei tentsuu gej uzn
Cross tabulation(交叉分组表)
Cross tabulation is a useful analysis tool commonly used to compare the results for one or more variables with the results of another variable.
交叉分组表的remark
Remark: when the cross tabulation involves aggregated data, we should investigate whether a hidden variable could affect the results. (Like Simpson’s paradox)
在Data上有几个类似 Data
Categorical Data
Quantitative Data