Module 1 Exam Flashcards
What is statistics?
Statistics if the science of data. Involves classifying, summarizing, organizing, analyzing, and interpreting information.
What is Data?
Data are the things about which information can be collected and analyzed, like people, numerical information, geographical areas, etc.
Data set
A data set is a rectangular array of data that contains various categories of data collected for a particular study.
Element
The element of a data set is simply the individual and unique entry in a data set about which data has been collected, analyzed, and presented in the same manner.
Variable
A variable is a particular, measurable attribute that the researcher believes is needed to describe the element in their study.
Types of data collection: cross-sectional data
Cross sectional data is information that is collected at the same or approximately the same point in time.
For example: a date, the average GPA of graduating seniors from college, split in lap time at a race.
Types of data collection: time series data
Time series data are collected over several time periods.
For example: the average US price per gallon for conventional regular gasoline
- Often found in business and economic publications
Descriptive statistics
Descriptive statistics utilizes numerical and graphical methods to explore data.
It is simply a compiled summary of the data analysis.
- For example: to look for patterns in a data set, to summarize info in a dataset, and to present the info in a convenient form.
the most commonly used numerical statistic is called the average or mean
5829 –> 5829/10 =
Inferential statistics
Inferential statistics utilize the sample data to make estimates, decisions, predictions, or other generalizations about a larger set of data
Level of measurement: nominal
- Categorical qualitative
- Names, labels, or qualities ( doesn’t imply order )
- favorite type of chocolate or color
Level of measurement: ordinal
- Qualitative or quantitative
- Can be arranged in order, or ranked BUT differences between data entries are not meaningful.
Level of measurement: interval
- Can be ordered, and meaningful differences between data entries can be calculated
- At this level, a zero entry simply represents a position on a scale; the entry is not an inherent zero
Level of measurement: ratio
-Similar to interval, but add that the zero is an inherent zero
- A ratio of two data entries can be formed, so that one data entry can be expressed as a multiple of another
((DOES “TWICE AS MUCH” MAKE SENSE?))
Experimental studies
- Begins with a researcher making a thorough review of the given situation
- During review, the researcher identifies the variables that have an influence on the individual elements of the study
- Researcher will then attempt to manipulate some of the variables in the study while holding others constant
- Ben, the researcher will compare the elements to each other to understand how the manipulated variables affect the elements being studied
Observational studies
- Researcher collects information, not by influencing the variables in the study, but by simply observing the variable in action
- Observation can be passive or active
*Passive observation is when the researcher has no intention to interact with the subject but collects information from a distance - Active observation is when the researcher interacts with the subject to some degree, can be as non-obtrusive as handing out a survey to fill out or as highly interactive as asking a slew of questions or by conducting a group analysis
Data collection methods: simulation
A simulation is the use of mathematical or physical models to reproduce a situation or process
- The use of computers allows for easy collection of data
- Simulations allow for the study of situations that may be impractical or dangerous in real life
- often employed because they are cheaper and save time
Data collection methods: surveys
A survey is an investigation of one or more characteristics of a population
- Most surveys are people asking questions
- Most often by carefully crafted interviews held in person, by phone, Internet, or mail.
- Questions have to be worded to yield unbiased responses and results
Data errors
Data errors can have a greater impact on research than omitted data.
This can be as simple as transposing two numbers when entering a number in a spreadsheet or as complicated as a respondent to a questionnaire not understanding the data correctly
Researchers need to develop skills that enable them to identify when data errors occur. A researcher’s ability to notice when a value for a particular variable is out of place, either too small or too large, is vital to the validity of the study
Three key elements of a well designed experiment
Control, randomization, and replication.
Important factors affecting experimental results: Confounding variable
Confounding variable occurs when an experimenter cannot tell the difference between the effects of different factors on the variable.
- For example a researcher is studying the effect of mental fatigue on study habits. The confounding effect found early on is boredom. The researcher is not sure how to distinguish between real mental fatigue or becoming bored with the test tasks.
Important factors affecting experimental results: placebo effect
A placebo effect occurs when a subject reacts favorably to a placebo when a subject has been given a fake treatment.
Blinding
Blinding is a technique where the subjects Do not know whether they are receiving a treatment or a placebo
Double blind experiment
In double blind experiment, neither the experimenter nor the subjects know whether the subjects are receiving the treatment or the placebo.
- Experimenters informed after all data has been collected
Randomization
Randomization is a process of randomly assigning subjects to different treatment groups
completely randomized design
In a completely randomized design, subjects are assigned to different treatment groups through random selection.
- It may be necessary for an experimenter To use blocks: groups of subjects with similar characteristics: groups of subjects with similar characteristics
- Randomized block designs
Randomized block designs
Randomized block designs allow experimenters to divide the subjects with similar characteristics into blocks and then, within each block, randomly assigned subjects to treatment groups.
matched-pairs design
Were subjects are paired according to some similarity relevant to the experimenters
- In this case, one member of the pair is randomly selected to receive the treatment while the other is kept as the control.
Sample size ; Replication
If a sample is too small, it may not have collected enough info to be unbiased or valid. If it’s too large, processing data may prove difficult to accomplish.
One way to prove validity is Replication: the repetition of an experiment under the same or similar conditions
Sampling techniques
- Census: account or a measure of an entire population
- A sampling is a count or measure of a part of the population and is commonly used in statistics.
To ensure it is unbiased, the sample must be representative of the greater population.
Must use appropriate sampling techniques, sampling errors can occur even in a well designed study. - A random sample is one in which every member of the population has an equal chance of being selected.
- A simple random sample is a sample in which every possible sample has the same size and same chance of being selected.
One way to do this is to assign different numbers to each member of the population and then use random number table or random number generator to choose numbers.
More sampling techniques: Stratified sample
Stratified sample is when it is important for the sample to have members from each segment of the population
- Members of the population are divided into two or more subsets (strata) That share a similar characteristic such as age, race, gender, economic status, etc.
- A sample is then selected from each stratum.
The use of stratified sample ensures th a fifteen percent of the population belong to a low income group and fifteen percent of the sample should be from this group
“if fifteen percent of the population Belonged to a low income group, then Fifteen percent of the sample should be from this group.”
More sampling techniques: cluster sample
From time to time, populations fall into naturally occurring subgroups, each having similar characteristics, so a cluster is more appropriate.
- To select a cluster sample, divide the population into groups called clusters, and then select all of the members in one or more, but not all, of the clusters.
- Care must be taken to ensure the members of the cluster have similar characteristics.
more sampling techniques: systematic sample
a systematic sample begins with each member of the population being assigned a number
- The members of the population are ordered in some way, starting with a randomly selected number
- Then the sample members are selected at regular intervals from the starting number.
-
Index number
For business and government, the index number is the most widely utilized numerical descriptive statistic.
An index is typically a compilation of several data sources, and weighted to provide a consistent review of a measured variable.
For example, the consumer price index or CPI: the index is compiled by the US department of Labor Statistics every month as a measure of the cost of living in the country
Population versus sample
A population is basically the set of all the elements the study is about, and sample is simply a subset of the population.
A parameter is a numerical description of a population characteristic.
A statistic is a numerical description of a sample characteristic.
Statistical inference
In the start, the use of a sample to make decisions about a population is called statistical inference.
Bar Chart
Categorical data (children vs favorite color)
Histogram
Quantitative data.