Data Management Flashcards by Cypress Faith

is the process of gathering and mesauring information about variables on study established systematic proceudre, which then enable toa naswer relevant questions at hand and evaluate outcomes

Data collection

How well did you know this?

Not at all

Perfectly

What are the four types of data?

Nominal, Ordinal, Interval, Ratio

How well did you know this?

Not at all

Perfectly

it is sometimes referred to as classificatory scale. this scale is used for classifying and labeling variables without quantitative value

nominal

How well did you know this?

Not at all

Perfectly

eye color, gender, vsu dormitories and degree programs are examples of

nominal

How well did you know this?

Not at all

Perfectly

it possesses the characteristics of the nominal scale, where it classifies data, however, the classification has ranks. data is shown in order of magnitude

ordinal

How well did you know this?

Not at all

Perfectly

educational attainment, instructor’s evaluation, emotion and organizational structure is an example of

ordinal

How well did you know this?

Not at all

Perfectly

this scale possesses the characteristics of the nominal and ordinal scale where data are classified and ranked.

Interval

How well did you know this?

Not at all

Perfectly

This scale doesn’t have a true zero.

Interval

How well did you know this?

Not at all

Perfectly

This scale is a classification that describes the nature of information within the ales assigned to varibles.

Interval

How well did you know this?

Not at all

Perfectly

IQ, transmutation of grades, BMI and Temperature are examples of

interval

How well did you know this?

Not at all

Perfectly

This scale possesses the characteristics of nominal, ordinal, and interval scale where zero is absolute. This is the point where the quality being measured does not exist.

Ratio

How well did you know this?

Not at all

Perfectly

Age, Monthly Income, Height, Allowance are examples of

Ratio

How well did you know this?

Not at all

Perfectly

is a grouping of the data into cateogires showing the number of observations in each of the non-overlapping classes

frequency distribution

How well did you know this?

Not at all

Perfectly

is the data collection in original form

raw data

How well did you know this?

Not at all

Perfectly

is the difference of the highest value and the lowest value in a distribution

range

How well did you know this?

Not at all

Perfectly

is the organization of data in a tabular form, using mutually exclusive classes showing the number of observations in each

frequency distribution

How well did you know this?

Not at all

Perfectly

is the highest and lowest vlues describing a class

class limits (apparent limits)

How well did you know this?

Not at all

Perfectly

is the upper and lower values of a class for group frequency distribution whose values has additional decimal place more than the class limits and end with the digit 5.

class boundaries (real limits)

How well did you know this?

Not at all

Perfectly

is the distance between the class lower boundary and the class upper boundary and it is denoted by the symbol i.

interval (width)

How well did you know this?

Not at all

Perfectly

is the number of values in a specific class of a frequency distribution

frequency (f)

How well did you know this?

Not at all

Perfectly

is obtained by multiplying the relative frequency by 100%

percentage

How well did you know this?

Not at all

Perfectly

is the sum of the frequencies accumulated up to the upper boundary of a class in a frequency distribution

cumulative frequency (cf)

How well did you know this?

Not at all

Perfectly

is the point halfway between the class limits of each class and is representative of the data within that class

midpoint

How well did you know this?

Not at all

Perfectly

is used to organize nomial-level or ordinal-level type of data.

categorical frequency distribution

How well did you know this?

Not at all

Perfectly

gender, business type, political affiliation and others are examples of the usage of the ___________

categorical frequency distribution

is used when the range of the data set is large; the data must be group into classes whether it is categorical data or interval data. For interval data the class is more than one unit in width.

grouped frequency distribution

rule i. to determine the number of classes is to use the smallest positive integer 'k' such that '2^k >= n' where 'n' is the total number of observations.

i = range/number of classes = (hv - lv)/k

rule 2. another way to determine the class interval is by applying the formula

i = range / ( 1 + 3.322 (lograithm of total frequencies))

is a graph which the classes are marked on the horizontal axis and the class frequencies on the vertical axis.

histogram

is a graph that displays the data using points which are connected by lines the freuqncies are represented by the heights of the points at the midpoints of the classes. the vertical axis represents the frequency of the distribution while the horizontal represents the midpoints of the frequency distribution.

FREQUENCY POLYGON

) is a graph that displays the cumulative frequencies for the classes in a frequency distribution. The vertical axis represents the cumulative frequency of the distribution while the horizontal axis represents the upper class boundaries of the frequency distribution.

CUMULATIVE FREQUENCY POLYGON (OGIVE)

is a graph used to represent a frequency distribution for a categorical data (or nominal level) and frequencies are displayed by the heights of the vertical bars which are arranged in order from highest to lowest.

PARETO CHART

is similar to bar histogram. The bases of the rectangles are arbitrary intervals whose centres are codes. The height of each rectangle represented the frequency of that category.

BAR CHART

is a circle divided into portions that represent the relative frequencies (or percentages) of the data belonging to different categories.

PIE CHART

represents data that occur over specific period of time under observation. In addition, it shows trend or pattern on the increase or decrease over the period of time.

TIME SERIES GRAPH

immediately suggests the nature of the data being shown. It is combination of the attention-getting quality and the accuracy of the bar chart. Appropriate pictures arranged in a row present the quantities for comparison.

PICTOGRAPH

is used to examine possible relationships between two numerical variables. The two variables are plot in 𝑥 axis and 𝑦 axis.

SCATTER PLOT

is a central or typical value for a probability distribution. It may also be called a center or location of the distribution.

central tendency

measures of central tendency

mean, median, mode

the average of the numbers.

mean

sample mean is denoted by

(𝑥 ) ̅

population mean is denoted by

𝜇

is particularly useful when various classes or groups contribute differently to the total.

weighted mean

is the value separating the higher half of a data sample, a population, or a probability distribution, from the lower half.

median

median is denoted by

𝑴𝒅.

a set of data values is the value that appears most often.

Mode

Mode may exist sometimes does not (T/F)

True

Mode denoted by

𝑴𝒐.

Measure of dispersion which includes range, interquartile range, absolute deviation, variance and standard deviation is also known as the

measures of spread or variability.

measures of dispersion involves

range, variance, interquartile, standard deviation, absolute deviation

This is the easiest measure of dispersion. It is the difference between the highest value and the lowest value.

RANGE

range is denoted as

𝑅=𝐻𝑉−𝐿𝑉

This is the expectation of the squared deviation of a random variable from its mean. It measures how far a set of numbers are spread out from their average value.

VARIANCE

This is the square root of its variance. A low standard deviation indicates that the data set tend to be closed to the mean.

Standard Deviation

A high standard deviation indicates that the spread of data points is of wider range. (T/F?)

True

This is the average distance of all of the elements in a data set from the mean of the same data set.

Absolute Deviation

It is sometimes referred to as measure of location. It is considered as the extension of median. It talks about the position/location of the value relative to the other values in the data set.

Measures of relative position

Measures of relative position involves

quartile, percentile, z-scores, box-and-whisker plot

This measures divides the observation in four equal parts.

Quartile

The lower and the upper quartile value helps us to find the measure of dispersion in the set of observation, which is called as

'inter-quartile range

inter quartile range is denoted as

IQR (difference between upper and lower quartile) q3 - q1 = iqr

This divides the observation in 100 equal parts.

Percentile

This indicates how many standard deviation an element is from the mean. The positive and negative signs indicates the direction of the point away from the mean.

Z-scores or standard scores

z-scores denoted as

A z-score less than 0 represents an element less than the mean. A z-score greater than 0 represents an element greater than the mean. A z-score equal to 0 represents an element equal to the mean.

TRUE

A z-score equal to 1 represents an element that is 1 standard deviation greater than the mean; a z-score equal to 2, 2 standard deviations greater than the mean; etc. A z-score equal to -1 represents an element that is 1 standard deviation less than the mean; a z-score equal to -2, 2 standard deviations less than the mean; etc.

TRUE

It is a graph of a data set obained by drawing a horizontal line from the minimum data value to first quartile, drawing a horizontal line to third quartile to the maximum value, and drawing a box whose vertical line passes through Q1 and Q3 with a vertical line inside the box passing through the median or second quartile.

Box-and –Whisker Plot

Data Management Flashcards

(67 cards)