Data Management Flashcards
is the process of gathering and mesauring information about variables on study established systematic proceudre, which then enable toa naswer relevant questions at hand and evaluate outcomes
Data collection
What are the four types of data?
Nominal, Ordinal, Interval, Ratio
it is sometimes referred to as classificatory scale. this scale is used for classifying and labeling variables without quantitative value
nominal
eye color, gender, vsu dormitories and degree programs are examples of
nominal
it possesses the characteristics of the nominal scale, where it classifies data, however, the classification has ranks. data is shown in order of magnitude
ordinal
educational attainment, instructorโs evaluation, emotion and organizational structure is an example of
ordinal
this scale possesses the characteristics of the nominal and ordinal scale where data are classified and ranked.
Interval
This scale doesnโt have a true zero.
Interval
This scale is a classification that describes the nature of information within the ales assigned to varibles.
Interval
IQ, transmutation of grades, BMI and Temperature are examples of
interval
This scale possesses the characteristics of nominal, ordinal, and interval scale where zero is absolute. This is the point where the quality being measured does not exist.
Ratio
Age, Monthly Income, Height, Allowance are examples of
Ratio
is a grouping of the data into cateogires showing the number of observations in each of the non-overlapping classes
frequency distribution
is the data collection in original form
raw data
is the difference of the highest value and the lowest value in a distribution
range
is the organization of data in a tabular form, using mutually exclusive classes showing the number of observations in each
frequency distribution
is the highest and lowest vlues describing a class
class limits (apparent limits)
is the upper and lower values of a class for group frequency distribution whose values has additional decimal place more than the class limits and end with the digit 5.
class boundaries (real limits)
is the distance between the class lower boundary and the class upper boundary and it is denoted by the symbol i.
interval (width)
is the number of values in a specific class of a frequency distribution
frequency (f)
is obtained by multiplying the relative frequency by 100%
percentage
is the sum of the frequencies accumulated up to the upper boundary of a class in a frequency distribution
cumulative frequency (cf)
is the point halfway between the class limits of each class and is representative of the data within that class
midpoint
is used to organize nomial-level or ordinal-level type of data.
categorical frequency distribution
gender, business type, political affiliation and others are examples of the usage of the ___________
categorical frequency distribution
is used when the range of the data set is large; the data must be group into classes whether it is categorical data or interval data. For interval data the class is more than one unit in width.
grouped frequency distribution
rule i. to determine the number of classes is to use the smallest positive integer โkโ such that โ2^k >= nโ where โnโ is the total number of observations.
i = range/number of classes = (hv - lv)/k
rule 2. another way to determine the class interval is by applying the formula
i = range / ( 1 + 3.322 (lograithm of total frequencies))
is a graph which the classes are marked on the horizontal axis and the class frequencies on the vertical axis.
histogram
is a graph that displays the data using points which are connected by lines the freuqncies are represented by the heights of the points at the midpoints of the classes. the vertical axis represents the frequency of the distribution while the horizontal represents the midpoints of the frequency distribution.
FREQUENCY POLYGON
) is a graph that displays the cumulative frequencies for the classes in a frequency distribution. The vertical axis represents the cumulative frequency of the distribution while the horizontal axis represents the upper class boundaries of the frequency distribution.
CUMULATIVE FREQUENCY POLYGON (OGIVE)
is a graph used to represent a frequency distribution for a categorical data (or nominal level) and frequencies are displayed by the heights of the vertical bars which are arranged in order from highest to lowest.
PARETO CHART
is similar to bar histogram. The bases of the rectangles are arbitrary intervals whose centres are codes. The height of each rectangle represented the frequency of that category.
BAR CHART
is a circle divided into portions that represent the relative frequencies (or percentages) of the data belonging to different categories.
PIE CHART
represents data that occur over specific period of time under observation. In addition, it shows trend or pattern on the increase or decrease over the period of time.
TIME SERIES GRAPH
immediately suggests the nature of the data being shown. It is combination of the attention-getting quality and the accuracy of the bar chart. Appropriate pictures arranged in a row present the quantities for comparison.
PICTOGRAPH
is used to examine possible relationships between two numerical variables. The two variables are plot in ๐ฅ axis and ๐ฆ axis.
SCATTER PLOT
is a central or typical value for a probability distribution. It may also be called a center or location of the distribution.
central tendency
measures of central tendency
mean, median, mode
the average of the numbers.
mean
sample mean is denoted by
(๐ฅ )ฬ
population mean is denoted by
๐
is particularly useful when various classes or groups contribute differently to the total.
weighted mean
is the value separating the higher half of a data sample, a population, or a probability distribution, from the lower half.
median
median is denoted by
๐ด๐ .
a set of data values is the value that appears most often.
Mode
Mode may exist sometimes does not (T/F)
True
Mode denoted by
๐ด๐.
Measure of dispersion which includes range, interquartile range, absolute deviation, variance and standard deviation is also known as the
measures of spread or variability.
measures of dispersion involves
range, variance, interquartile, standard deviation, absolute deviation
This is the easiest measure of dispersion. It is the difference between the highest value and the lowest value.
RANGE
range is denoted as
๐ =๐ป๐โ๐ฟ๐
This is the expectation of the squared deviation of a random variable from its mean. It measures how far a set of numbers are spread out from their average value.
VARIANCE
This is the square root of its variance. A low standard deviation indicates that the data set tend to be closed to the mean.
Standard Deviation
A high standard deviation indicates that the spread of data points is of wider range.
(T/F?)
True
This is the average distance of all of the elements in a data set from the mean of the same data set.
Absolute Deviation
It is sometimes referred to as measure of location. It is considered as the extension of median. It talks about the position/location of the value relative to the other values in the data set.
Measures of relative position
Measures of relative position involves
quartile, percentile, z-scores, box-and-whisker plot
This measures divides the observation in four equal parts.
Quartile
The lower and the upper quartile value helps us to find the measure of dispersion in the set of observation, which is called as
โinter-quartile range
inter quartile range is denoted as
IQR (difference between upper and lower quartile) q3 - q1 = iqr
This divides the observation in 100 equal parts.
Percentile
This indicates how many standard deviation an element is from the mean. The positive and negative signs indicates the direction of the point away from the mean.
Z-scores or standard scores
z-scores denoted as
Z
A z-score less than 0 represents an element less than the mean.
A z-score greater than 0 represents an element greater than the mean.
A z-score equal to 0 represents an element equal to the mean.
TRUE
A z-score equal to 1 represents an element that is 1 standard deviation greater than the mean; a z-score equal to 2, 2 standard deviations greater than the mean; etc.
A z-score equal to -1 represents an element that is 1 standard deviation less than the mean; a z-score equal to -2, 2 standard deviations less than the mean; etc.
TRUE
It is a graph of a data set obained by drawing a horizontal line from the minimum data value to first quartile, drawing a horizontal line to third quartile to the maximum value, and drawing a box whose vertical line passes through Q1 and Q3 with a vertical line inside the box passing through the median or second quartile.
Box-and โWhisker Plot