Introduction to EDA Flashcards
It is a collection of methods for planning experiments, obtaining data, and then organizing, summarizing, analyzing, interpreting, and drawing conclusions based on the data.
STATISTICS
The practice or science of collecting and analyzing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative sample
STATISTICS
are methods for organizing and summarizing data.
Descriptive statistics
consists of procedures used to make inferences about population characteristics from information contained in a sample drawn from this population
INFERENTIAL STATISTICS
covers a large variety of techniques that allow us to make actual claims about a population based on sample of data
INFERENTIAL STATISTICS
The theory of statistics uses _________ to measure the uncertainty associated with an inference. It enables us to calculate the probabilities of observing specific samples under specific assumptions about the population. The statistician uses these probabilities to evaluate the uncertainties associated with sample inferences.
probability
information or facts necessary to conduct a certain study.
Data
in a statistical study is the group of objects drawn about which conclusions are to be drawn
POPULATION
is a subset of measurements selected from the population of interest
SAMPLE
A descriptive value for a population is called a
parameter
A descriptive value for a sample is called ________
statistic
measures a quality or characteristic on each experiment unit.
Qualitative data
measures a numerical quantity or amount on each experiment unit.
Quantitative data
Results from either a finite number of possible values or a countable number of possible values (that is, the number of possible values is 0, 1, 2, and so on)
Discrete data
Results from many possible values that can be associated with points on a continuous scale in such a way that there are no gaps or interruptions
Continuous data
To establish relationships between variables, researchers must observe the variables and record their observations This requires that the variables be _______
measured
The process of measuring a variable requires a set of categories called a _______ and a process that classifies each individual into one category
scale of measurement
is characterized by the data that consist of names, labels or categories only, and the data cannot be arranged in an ordering scheme
nominal level of measurement
involves data that may be arranged in some order, but differences between data values either cannot be determined or are meaningless
ordinal level of measurement involves
Is like the ordinal level, with the addition that meaningful amounts of differences between data can be determined. However, there is no inherent zero starting point.
Interval level measurement
Is the interval level modified to include the inherent zero starting point. For values at this level, differences and ratios are meaningful.
ratio level of measurement
A set of measurements that has not been organized numerically is called
raw data
Data that are presented in the form of frequency distribution are called
grouped data
The organization of raw data in table form with classes and frequencies An arrangement of a large mass of data by grouping into different classes of the same size and determining the number of observations that fall in each of the classes
Frequency Distribution
data that can be placed in specific categories, such as gender, hair color or religious affiliation
Categorical Frequency Distribution
A frequency distribution of numerical data The raw data is not grouped
Ungrouped Frequency Distribution
A frequency distribution where several numbers are grouped into one class
Grouped Frequency Distribution
categories (or classes) of scores, along with counts (or frequencies) of the number of scores that fall into each category
frequency table
for a particular class is the number of original scores that fall into that class
frequency
distribution represent data that can be placed in specific categories, such as gender, hair color or religious affiliation
Categorical frequency
are the smallest number that can actually belong to the different classes
Lower class limits
are the largest number that can actually belong to the different classes
Upper class limits are
are the numbers used to separate classes, but without the gaps created by the class limits They are obtained increasing the upper class limits and decreasing the lower class limits by the same amount so that there are no gaps between consecutive classes The amount be added or subtracted is one half the difference between the upper limit of one class and the lower limit of the following class
Class boundaries
are the midpoints of the classes. They can be found by adding lower class limits and dividing by 2.
Class marks
is the difference between two consecutive lower class limits or two consecutive lower class boundaries.
Class width or Class size
ratio of the class frequency to the total frequency
Relative Frequency
if the frequencies are summed from bottom up The less than cumulative is constructed if the frequencies are summed from top down to find the number of observations less than a particular upper class boundary
Cumulative Frequency