Descriptive Statistics Flashcards
Categorical data?
Data that can be grouped by specific categories
Tabular display methods
chanariin ugugdliig negtgeh
Frequency distribution(频数分布)
Relative frequency distribution(相对频数分布)
Percent frequency distribution(百分数频数分布)
Categorical variable (graphic)
Bar chart(条形图)
Pie chart(饼图)
What is frequency distribution(频数分布)
Davtamjiin tarhalt ni ylgaatai dvhtsaagui angilal buyu angi bur dehi ajiglaltiin too damtamjiig haruuldag husnegt
Anhnii ugugdliig shuud haraad turgen olj avch chadahgui medeelleer hangah zorilgotoi
Quantitative data?
Data that use numeric values to indicate how much or how many
How to find Relative frequency distribution(相对频数分布)?
Tuhain angilliin haritsangui davtamj ni tuhain angilald hamaarah ajiglaltiin niit dun deh butarhai heseg buyu proports um
Haritsangui davtamj = davtamj / n
Frequency distribution (频数分布)的3步
Toon ugugdul buhii davtamjiin tarhaltiin huvid bulgiug(angilal) todorhoilohod hregtei 3 alham:
Davhtsaagui bulgiin toog todorhoil
Buleg buriin urtiig todorhoiloh
Bulguudiin hiliig todorhoiloh
Bulgiin(angilaliin) urtiig herhen todorhoiloh ve
Tentsuu urttai bulguud ashigla
Oiroltsoogoor bulgiin urt = hamgiin ih utga - hamgiin baga utga / bulgiin too
Making comparison of 3 or 2 data what kind of chart do we need to draw
Make comparison -
Side by side bar chart (复合条形图)
Stacked bar chart(结构条形图)
Show relationship diagram?
- scatter diagram(散点图)
散点图(scatter diagram)有几类似?
Eyreg hamaaral lec2b p.11
Surug hamaaral
Medegdehuits hamaaral bhgui
Randomly sampled?
Hervee uzuuleltuud ni tuuvriin toon ugugdluur tootsoologdoj bvl tedgeeriig tuuvriin uzuuleltuud gene
Tuuvriin dundaj x=£xi / n
Eh olonlogiin parametriin uzuulelt
Hervee uzuuleltuud ni eh olonlogiin toon ugugdluur tootsoologdoj bvl tedgeeriin eh olonlogiin parametruud gj nerldg
u = £xi / N
Weighted mean
X= £wi xi / £wi '
xi = i-r ajiglaltiin utga
wi= i-r ajiglltiin utgiin jin
Hurtver : jinlesen ugugdliin utguudiin niilber
Huvaarit : jingiin niilber
Hervee ugugdul eh olonlog bol x bish u-g bichn
Trimmed mean(截断平均值):
:The mean of data deleting a percentage of the smallest and largest values
The pth percentile means approximately p%of the observations are less than It.
Location of the pth percentile:Lp = p / 100 (n+1)
pth percentile = p. 33
Dood quartiles = Q1
II quartile = Q2
Deed quarttile = Q3
What is measures of variability?
Helbelzliin toon uzuuleltuudiig tuviin handlagiin toon uzuuleltuudtei hamtad ni avch uzeh shaardlagatai boldg
Measures of variability includes?
Helbelzliin toon uzuulelt
Range (dalaits )= largest value -smallest value
Interquartile range(四分位距): Q3 -Q1quartile hoorondiin dalaits
Variance: s2 = £(xi - x)2 / n - 1 tuuvriin variance
Eh olonlogiin variance : o2 = £(xi - u)2 / N
Standard deviation: s = yzguur s2
Coefficient of variation: CV = [ s/x * 100 ] %
Measures of distribution shape
Slowness(偏度)heltiilt = n / (n - 1) (n - 2) £[xi - x / s]3
Heltiilt utga ni =0 dundj boln median in utg tentsuu
Zuun talruu dund zerg heltiilt hiisen bl heltiiltiin utga surug
Dundaj ni ihenhdee medianii utgaas baga bn
Baruun talru heltiisen utga ni eyreg
Dundaj ni ihenhdee medianii utgaas ih bn
Baruun talruuga nileen ih heltiisen utga eyreg (ihevchlen 1s ih)
Dundaj utga ihenhdee medians ih bn
Measures of relative location
Z-score/standardized value (z-分数/标准化值):
Zi = xi−x ̄ / s
Chebyshev’s Theorem: At least (1 − 1/z2) of the items in any data set will be within z standard deviations of the mean, where z is any value greater than 1.
Дурын өгөгдлийн хувьд дор хаяж (1 - 1/z2) утгууд нь z > 1 байх үед x ̄ ± 𝑧 ∙ 𝑠 завсарт оршин байна.
Measures of relative location: 2
Three-sigma Rule of Thumb (68–95–99.7 rule): For a data having bell-shape distribution,
Approximately 68% of the data values will be within 𝜇 ± 𝜎.
Approximately 95% of the data values will be within 2 standard deviations of the mean(𝜇 ± 2𝜎).
Approximately all (99.7%) of the data values will be within 3 standard deviations of the mean(𝜇 ± 3𝜎 ).
Detecting outliers
Outlier: a data value with a z-score less than −3 or greater than +3, it might be
an incorrectly recorded data value
a data value that was incorrectly included in the data set
a correctly recorded data value that belongs in the data set
What is five number summary?
Five number summary: smallest value, first quantile, median, third quantile, largest value
What is box plots:
Box plot ni tavan toonii duremd suurilan baiguuldag graphic durslel um
What is symbol *:
Symbol *:Data outside the lower and upper limits are considered outliers, usually being shown with symbol *in the plot.
What is Covariance?
Ковариац нь хоёр хувьсагчийн хоорондох шугаман хамаарлыг хэмжих үзүүлэлт юм..
Sample: sxy = £n (xi −x ̄)(yi −y ̄) / n - 1 & Population: σxy = £n (xi −μx )(yi −μy ) / N
What is Correlation Coefficient?
Корреляци нь шугаман хамаарлыг хэмжих үзүүлэлт болох боловч харин учир шалтгаан нь болдоггүй.
• Хоёр хувьсагч хүчтэй хамааралтай байна гэдэг нь нэг хувьсагч нь нөгөө хувьсагчийнхаа учир шалтгаан нь болно гэсэн үг биш юм
Correlation Coefficient的公式?
p. 40 ppt1
Correlation Coefficient原理
Корреляци нь -1-с +1-ийн хооронд утгаа авна.
• •
-1 ойролцоо утга нь хүчтэй сөрөг хамаарал байгааг харуулна. +1 ойролцоо утга нь хүчтэй эерэг хамаарал байгааг харуулна.
0-ийн ойролцоо корреляци нь хамаарал сул байгааг илтгэнэ.