The Scientific Method & Statistical Analysis Flashcards
What is the definition of the Scientific Method?
A systematic investigation towards increasing the sum of human knowledge
What is a research hypothesis?
A research hypothesis is a statement that proposes a possible explanation to some phenomenon or event
What is a characteristic of a good hypothesis?
A good hypothesis must be testable (falsifiable/refutable)
Research Question: You are interested in understanding what causes leaves to change colour?
Which of the following hypothesis is testable?
A. Temperature may cause leaves to change colour
B. Leaves change colour as little invisible elves want them to
A as B cannot be rejected because there is no way to measure the activity of little invisible elves
What do good hypotheses often contain?
Good hypotheses often contain at least two variables and their (causal) connection
What are the two main types of research designs for collecting data?
- experimental research
- observational research
What is experimental research?
Experimental research involves willfully manipulating (independent/predictor/treatment) variables to observe the outcome on your response (dependent) variable.
What is observational research?
Observational research involves studying the response variable in ‘in situ’ without any direct manipulation of independent/predictor variables
What is the independent and dependent variable in this research question:
How does biodiversity change with decreasing latitude?
Dependent variable: Butterfly species no.
Independent variable: Latitude
What is the independent and dependent variable in this research question:
Does dosage of an antibiotic lead to more resistance?
Dependent variable: Bacterial survival
Independent variable: Dose levels of antibiotic
What is the independent and dependent variable in this research question:
Is a particular gene involved in eye development ?
Dependent variable: Eyes present or absent
Independent variable: is the gene on or off?
What is sampling in research design?
Sampling in research design is the process of selecting a subset of individuals or units from a larger population to study
Why do we have to sample from a population?
We have to sample from a population because it is often impractical or impossible to collect data from every individual in the population
Why does the sample have to be random?
The sample has to be random to avoid bias and ensure that every individual or unit in the population has an equal chance of being selected
What are some different approaches to random sampling?
Different approaches to random sampling include:
- simple random sampling
- stratified random sampling
- cluster random sampling
- systematic random sampling
Why is random assignment used?
Random assignment is needed in experimental studies to ensure that each participant has an equal chance of being assigned to different treatment groups.
Why may random sampling not be useful in an experimental study?
Random sampling may not be useful in an experimental study if the sample is already limited to a specific population that has met certain inclusion criteria
What is Random Sampling?
Random Sampling is a research design sampling method in which every element of the population has an equal chance of being selected
What factors determine the choice of sampling method?
The choice of sampling method depends on:
- your population
- budgetary constraints
What is the objective of using Random Sampling methods?
Help minimise any errors caused by chance, bias, or confounding when making inferences
What is the significance of variability in statistics?
Refers to the amount of dispersion or spread of data around the mean or average. The degree of variability can provide important information about the data and help in drawing meaningful conclusions.
Can the number of bricks eaten last week be used in statistical analysis?
No as it is not a relevant or meaningful variable
Can the number of grapes eaten last week be used in statistical analysis?
Yes as it is a relevant and meaningful variable
Which of the following are Statistical Questions:
How much does my pet grapefruit weigh?
What was the average score on the essay on the Practising Scientist last year?
How many teeth does my mother have?
How much time do members of family spend on their phones (screen time) at dinner?
How many times have I watched Lord of the Rings?
statistical questions:
- What was the average score on the essay on the Practising Scientist last year?
- How much time do members of family spend on their phones (screen time) at dinner?
not statistical questions:
- How much does my pet grapefruit weigh?
- How many teeth does my mother have?
- How many times have I watched Lord of the Rings?
What is the difference between descriptive and inferential statistics?
Descriptive statistics: involves summarising and exploring a collection of data, either graphically or numerically.
Inferential statistics: involves estimating parameters and their confidence limits, as well as hypothesis testing.
What are some examples of descriptive statistics?
- measures of central tendency (e.g., mean, median, mode)
- measures of variability (e.g., range, standard deviation)
- graphical displays (e.g., histograms, box plots, scatter plots)
What are some examples of inferential statistics?
- hypothesis testing (e.g., t-tests, ANOVA, regression analysis)
- confidence interval estimation
- model selection
What are the two types of variables in data?
- numerical
- categorical
What are numerical variables?
Variables that
- take on numerical values
- sensible to perform arithmetic operations on
- quantitative
What are some examples of numerical variables?
- age
- height
- weight
- income
- temperature
What are categorical variables?
Variables that
- take on a limited number of distinct categories
- not sensible to perform arithmetic operations with these categories (but can be identified with number)
- qualitative variables
What are some examples of categorical variables?
- gender
- ethnicity
- occupation
- education level
- favorite colour
What are the subtypes of numerical variables?
- continuous
- discrete
What is a continuous variable?
A numerical variable that can take on any value within a given range
What is a discrete variable?
A numerical variable that can only take on specific, distinct values, often represented by integers
What are the subtypes of categorical variables?
- regular categorical
- ordinal
- binary
What is a regular categorical variable?
A categorical variable that can take on one of several distinct categories, but these categories have no inherent order or ranking
What is an ordinal variable?
A categorical variable in which the categories have an inherent ordering or ranking
What is a binary variable?
A categorical variable that can take on only two categories, such as
- yes or no
- true or false
- male or female
How can errors in numeric data be identified and corrected in Excel?
- plot each numerical variable as a column chart or scatterplot and look for outliers
- row number will be on the x-axis, so you can go and check the value
- can also sort each numerical column and look at the extreme ends to identify potential errors
- once errors are identified, they can be corrected directly in Excel
What is alphanumeric data?
Data that consists of both letters and numbers
How can errors in alphanumeric data be identified and corrected in Excel?
- sorting and checking, but this method may miss subtle differences such as case changes, missing letters, and spaces
- Pivot Tables: quickly identify and correct errors
How are missing values represented in Excel?
By blank (empty) cells
What code is used for missing values in R?
In R, the code NA is used for missing values and should not be used for anything else
What are the $ sign rules?
$A1 = column A, but different row
A$1 = row 1, different column
$A$1 = same cell