Midterm 1 Flashcards
What is descriptive stats?
consists of methods for organizing and summarizing information
What is inferential statistics?
consists of methods for drawing and measuring the reliability of conclusions about a population based on information obtained from a sample of the population
What is a population?
The collection of all individuals or items under consideration in a statistical study
What is a sample?
That part of the population from which information is actually obtained
What is an observational study?
- researchers simply observe characteristics and take measurements, as in a sample survey - Observation studies can reveal association, but not causation
What is a designed experiment?
- researchers impose treatments and controls and then observe characteristics and take measurements - Designed experiments (done properly) reveals both association and causation.
What is a census?
A survey that includes every member of the population
What are the issues with censuses?
- If the population is large, it can be very costly and difficult (perhaps impossible) to collect information from every member of the population. -Since a census is usually too costly or takes too long, most statistical information is gathered by sampling or experimentation
What is sampling?
collecting the information from a sample rather than the entire population. - Since the of sampling is to make decisions about the corresponding population, it is important that the results obtained from sampling closely match the results that we would obtain by conducting a census. - This means sampling must be done very carefully so as to obtain a representative sample. - One method of sampling is to try to choose elements of the population so each element has an equal chance of being included in the sample.
What is simple random sampling?
A sampling procedure for which each possible sample of a given size is equally likely to the one obtained.
What is a simple random sample?
A sample obtained by simple random sampling. - Using an SRS (simple random sample) is a common way of obtaining a representative sample. - Samples may be selected with or without replacement. - In sampling with replacement, each time an element is chosen from the population, it is put back in the population - thus any element may be chosen more than once for a sample. - In sampling without replacement , an element of the population is removed from the population once it has been chosen - thus any element can only appear only once in the sample
What is a study?
The process of sampling a population, and collecting the information of interest
What is raw data?
Data recorded in the sequence in which they are collected and before they are processed or ranked
What is a variable?
A characteristic that varies from one individual to another
What is qualitative or categorical variable?
A non-numerically valued variable Examples of qualitative (i.e. categorical) variables include eye colour, first letter in a persons last name, type of automobile a person drives.
What is a quantitative variable?
A numerically valued variable. - Examples of quantitative variables include height, weight, age, speed of traffic (the various vehicles) at a certain location and time and number of stars that can be observed in a particular part of sky. - There are two types discrete and continuous
What is a discrete variable?
A quantitative variable whose possible values can be listed.
What is a continuous variable?
A quantitative variable whose possible values form some interval of numbers.
What is data?
Values of a variable
What is Qualitative or Categorical data
Values of a qualitative or categorical variable
What is quantitative data?
Values of a quantitative variable
What is discrete data?
Values of a discrete variable
What is continuous data?
Values of a continuous variable
What is a data set?
The collection of all observations for a particular variable
What is an observation?
Each piece of individual data
What is frequency distribution for qualitative (categorical) data?
lists all categories and the number of elements that belong to each of the categories
What is relative frequency?
a category is obtained by dividing the frequency of that category by the sum of all frequencies - the relative frequency shows what fractional part or proportion of the total frequency belongs to the corresponding category
What is the relative frequency distribution?
lists the relative frequencies for all categories. - Relative frequency must always add up to 1.00
How is a percentage of a category obtained?
by multiplying the relative frequency of that category by 100 -
What is a percentage distribution list?
lists the percentages for all categories - percentage must always add up to 100
What is a bar graph or bar chart?
A graph made of bars whose heights represent the frequencies or relative frequencies or percent frequencies of respective categories
What is a pie chart?
A circle divided into portions that represent the relative frequencies or percentages of a population or a sample belonging to different categories - we multiply 360 by the relative frequency of each category to obtain the degree measure or size of the angle for the corresponding category.
What is a class, category or bin?
For quantitative data, an that includes all the values that fall within two numbers, the lower and upper limits - Each class (category, bin) is defined by an interval - these are chosen so every measurement in the data set falls into exactly one interval.
What is a frequency?
are the number of values that belong to different classes and are denoted by f
What is frequency distribution for quantitative data?
lists all the classes and the number of values that belong to each class. Data presented in the form of a frequency distribution are called .group data
What is single value grouping?
- In single-value grouping, each distinct number in the set of measurements forms a class. - Single-value grouping is appropriate for discrete data in where there are only a small number of distinct values. - Make a table of frequencies of each distinct value, in order from smallest to largest.
What is limit grouping?
- This method is appropriate for data. - In this method, we define the classes by providing numbers called class limits - For each class, there is a lower limit and upper limit, selected by the person doing the analysis. - The lower limit of a class is the smallest number that could be in the class. - The upper limit of a class is the largest number that could be in the class.
What is lower class limit?
The smallest value that could go into a class
What is upper class limit?
The largest value that could go into a class
What is class width?
The difference between the lower limit of a class and the lower limit of the next-higher class
What is class mark?
The average of the two class limits of a class
What are the guidelines for grouping?
- The number of classes should be small enough to provide an effective summary but large enough to display the relevant characteristics of the data. Generally 5 to 20 classes are used. 2. Each observation must belong to one, and only one, class. 3. Whenever feasible, all classes should have the same width
What is cutpoint grouping?
- This method is similar to Limit Grouping - define classes by choosing two values. - there is an important difference in how one of these values is defined. - This modification makes this method appropriate for continuous data although it may also be used for discrete data.
What is lower class cutpoint?
The smallest value that could go in a class
What is upper class cutpoint?
The smallest value that could go in the next-higher class (equivalent to the lower cutpoint of the next-higher class)
What is class width in cutpoint grouping?
The difference between the cutpoints of a class
What is class midpoint in cutpoint grouping?
The average of two cutpoints of a class
What is the relative frequency of a class?
is the frequency of that class divided the sum of all frequencies (i.e. the number of measurements).
What is a histogram?
displays the classes of the quantitative data on a horizontal axis and frequencies or relative frequencies or percent frequencies of those classes on a vertical axis. The frequency (relative, percent) of each class is represented by a vertical bar whose height is equal to the frequency (relative, percent) of that class. The bars should be drawn so that consecutive classes share a common side (bars are touching)
What is single-value grouping in a histogram?
For single-value grouping, we use the distinct values of the observations to label the bars, with each such value centered under its bar.
What are other groupings in a histogram?
For limit grouping or cutpoint grouping, we use the lower class limits (or equivalently, lower class cutpoints) to label the bars.
What are the different types of shapes of a histogram?
- symmetric (identical on both sides of its central point) - skewed (is asymmetric. For a skewed histogram, the tail on one side is longer than the tail on the other side can be skewed to the left or the right) - uniform or rectangular (has the same frequency for each uniform rectangular histogram class)
What can make graphs misleading?
- Changing the scale 2. Truncating the frequency axis 3.
What is a stem and leaf display?
For quantitative data, each value is divided into two portions-a stem and a leaf. The leaves for each stem are shown separately in a display. - A common method for stem diagrams is to use the last digit of each data value as the leaves and all the preceding digits as stems - If one or more of the stems has too many leaves, the stem-and-leaf display doesn’t look good and may be difficult to fit on a page horizontally. - The remedy for this is to use two stems of the same value - the first stem having leaves with digits 0 through 4 and the second stem having leaves with digits 5 to 9.
What is a dotplot?
- All the data values from smallest to largest are listed on a horizontal scale. - A dot is placed above each data value for each occurrence of the data value -Dotplots can help us detect outliers (also called extreme values) in a data set. - Stem and Leaf and dotplots are easy ways to graph sets of data in a comparative way. - If you graph two or more sets of data with stem and leaf or dotplot, make sure the scales line up and scaling is identical so a comparison of the resulting graphs can be made
What is an outlier or extreme value?
Values that are very small or very large relative to the majority of the values in a data set are called outliers or extreme values.