Midterm Flashcards
It is a branch of mathematics that deals with the scientific collection, organization, presentation, analysis and interpretation of data in order to obtain useful and meaningful information.
Statistics
It is any pieces of information useful to the researcher; the measurements obtained in a research study.
Data
It is a characteristic or property of an individual to be measured or observed.
Variable
It has a value or numerical measurement for which operations such as addition or averaging make sense.
Quantitative Variable
2 Quantitative Variable
a. discrete variable – is a variable that can be obtained by counting.
b. continuous variable – is a variable that can be obtained by measuring.
It describes an individual by placing the individual into a category or group.
Qualitative Variable
These are the people, places, events or objects included in the study.
Individuals
Types of Variables
QUALITATIVE (ATTRIBUTE, OR CATEGORICAL VARIABLE)
- A variable that categorizes or describes an element of a population.
QUANTITATIVE (NUMERICAL VARIABLE)
-A variable that quantifies an element of a population.
It refers to the totality of all the individuals which one has an interest at a particular time.
Population
It is a part of a population determined by sampling procedures.
Sample
It is a value that describes an aspect of a population.
Parameter
It is a value that describes an aspect of a sample.
Statistic
Two Areas of Statistics
DESCRIPTIVE STATISTICS - methods concerned w/ collecting, describing (organizing, presenting, summarizing), and analyzing a set of data without drawing conclusions (or inferences) about a large group.
INFERENTIAL STATISTICS - methods concerned with drawing conclusions, making predictions, forecast or about the entire set of data.
SAMPLING METHODS CAN BE:
Random (Probability Sampling) or Nonrandom (Non-Probability Sampling)
Each sample of the same size has an equal chance of being selected.
Simple Random Sampling
Divide the population into groups called strata and then take a sample from each stratum.
Stratified Sampling
Divide the population into strata and then randomly select some of the strata. all the members from these strata are in the cluster sample.
Cluster Sampling (Area Sampling)
Randomly select a starting point and take every k-th piece of data from a listing of the population.
Systematic Sampling
LEVEL OR SCALES OF MEASUREMENT
- NOMINAL LEVEL OF MEASUREMENT
- ORDINAL LEVEL OF MEASUREMENT
- INTERVAL LEVEL OF MEASUREMENT
- RATIO LEVEL OF MEASUREMENT
Nominal Level of Measurement
- it applies to data that consist of names, labels, or categories.
- there are no implied criteria by which the data can be ordered from smallest to highest.
Ordinal Level of Measurement
- it applies to data that can be arranged in order/rank.
- differences between the data values either cannot be determined or are meaningless.
Interval Level of Measurement
- it applies to data that can be arranged in order.
- differences between data values are meaningful.
- data at this level have no true, or no meaningful zero.
Ratio Level of Measurement
- it applies to data that can be arranged in order.
- in addition, both differences between data values, and ratio of data values are meaningful.
- data at the ratio level have a true or meaningful zero.
It is the first step in conducting a study or research after the formulation of the research problem.
Collection of Data
Classifications of Data
PRIMARY DATA
- are first-hand information
- information gathered directly from the source
- data from personal interviews
SECONDARY DATA
- data that have been collected for another purpose
- data from books, encyclopedias, journals, magazines, and other researches or studies conducted by other individuals
- data taken from the internet
Methods of Collecting Data: Interview Method
- referred to as direct method
- involves interviewer and interviewee
- advantage/s: researcher can get more accurate answer/response since clarifications can be made.
- disadvantage/s: time-consuming and costly
Methods of Collecting Data: Questionnaire Method
- referred to as indirect method
the questionnaire (also called survey) is a set of questions given to a sample of people. - advantage/s: it can save time and money and it can cover a lot of respondents.
- disadvantage/s: questions can’t be clarified. return rate of questionnaire can be low. a rule of thumb from a standard textbook
Methods of Collecting Data: Observation Method
- making use of the different human senses in gathering information.
- advantage/s: data collected by observation are, thus, more objective and generally more accurate.
- disadvantage/s: limiting factor to observe. sometimes time-consuming, and costly
Methods of Collecting Data: Registration or Census
- governed by enacted laws
- covers a large scope of population or the entire population
- some government agencies are LTO, COMELEC, and PSA (formerly, NSO)
Methods of Collecting Data: Experimental Method
- collecting the data to be further analyzed.
- there are two types of experiments – laboratory and field.
- establishing cause-and-effect relationships
Measures of central tendency - typical average score
- MEAN: arithmetic average
- MEDIAN: middlemost value
- MODE: most frequently occurring value
Measures of variability - typical average variation
- RANGE: distance from the lowest to the highest (use 2 data points)
- VARIANCE: (use all data points)
- STANDARD DEVIATION
The difference between the maximum and minimum value in a data set.
Range (R)
It is a circular chart divided into sectors, illustrating relative magnitudes in usually in percents. The area is proportional to the quantity it represents.
Pie Chart
It is a graph which uses lines to connect individual data points that display quantitative values over a specified time interval.
Line Graph
It is a graph with rectangular bars.
Bar Graph
Types of Bar Graphs
- Grouped Bar Graph
- Stacked Bar Chart
- Segmented Bar Graph
It is a display of statistical information that uses rectangles to show the frequency of data items in successive numerical intervals of equal size.
Histogram
Histogram vs. Bar Graph
Histograms are a great way to show results of continuous data, but when the data is in categories (such as country or favorite movie), a bar graph is used.
It is a graphic tool used to display the relationship between two quantitative variables.
Scatterplots
Tables are used for Presentation of Graphs
- trend is not important
- number of values are small
- to complement other data presentation formats
Relationships between two categorical variables can be shown through…
Two-Way Tables
It is an assumption, or an idea that
proposes a tentative explanation about a phenomenon or a narrow set of phenomena observed in the natural world.
Hypothesis
Two Kinds of Hypothesis
Null Hypothesis (Ho) – is the hypothesis that must always express the idea of nonsignificance of difference, lack of association or relationship between variables (dependent and independent).
Alternative Hypothesis (Ha) – is the opposite of the null hypothesis. It specifies an existence of a difference (or that one group is better than the other), association or relationship between the dependent and independent
variables.
When we reject the null hypothesis when in fact Ho is
true.
Type I error (∝-error)
When we accept the null hypothesis when in fact Ho is false.
Type II error (β-error)
It is the maximum value of the probability of rejecting the
null hypothesis when in fact it is true
Level of significance
A function of the random sample, that is based on the observations and is used to make the decision in favor of the null or alternative hypotheses.
Test Statistic
It is a set of values of the test statistic that is chosen before the
experiment to define the conditions under which the null hypothesis will be rejected.
Critical region
It is a value that separates a critical region (rejection region) from
acceptance region in a hypothesis test, usually presented in tables.
Critical value
It is used when the critical region is
located at only one extreme of distribution or range of values for the test statistic.
One-tailed test
It is used when the critical region is
located on both sides of the distribution or range of values for the test statistic.
Two-tailed test
It refers to the number of free
choices that can be made.
Degrees of freedom (df)
STEPS IN HYPOTHESIS TESTING:
- Formulate the null hypothesis (Ho) and alternative hypothesis (Ha), which is used in case Ho is rejected.
- Set the level of significance, ∝., and determine the direction of the test.
- Determine the test statistic to be used.
- Determine the tabular value for the test.
- Compute for the test statistic as needed, using the appropriate formula.
- Compare the computed value with its corresponding tabular
value. - State the conclusion.
__________ Ho if the absolute computed value is equal to or
greater than the absolute tabular value.
Reject
__________ Ho if the absolute computed value is less than the absolute tabular value.
Accept
It is an inferential statistic that is used to determine the difference or to compare the means of two groups of samples which may be related to certain features.
T-Test
There are three different versions of t-tests:
→ One sample t-test which tells whether means of sample
and population are different.
→ Two sample t-test also is known as Independent t-test —
it compares the means of two independent groups and
determines whether there is statistical evidence that the
associated population means are significantly different.
→ Paired sample t-test compares the means of two variables for a single group.
It is used to compare multiple (three or more) samples with a single test.
ANOVA (Analysis of Variance) or F-Test
The hypothesis being tested in ANOVA is
Null Hypothesis: All pairs of samples are same, i.e. all sample means are equal.
Alternative Hypothesis: At least one pair of samples is significantly different.
It is used when we perform hypothesis testing on two categorical (qualitative) variables from a single population.
Chi-Square Test
It is the test statistics that measures the statistical relationship, or association, between two quantitative variables.
Pearson’s correlation coefficient, or Pearson r