Summarising Data Flashcards
- Understanding the importance of summarising data in neuroscience. - Clear demonstration of different types of data - How you can summarise data? - Get a clear idea of data variability & how you could Identify outliers in your data? - Demonstrate ways of managing my research data. - Apply statistical software (e.g., STATA) to carry out exploratory analysis and display a dataset (histogram, box plot, cumulative frequency) presentation.
Why is it important to summarise data in neuroscience?
Using statistics to summarise data allows it to be presented and communicated, as well as quantifying the variation and uncertainty in the data.
CLARITY AND UNDERSTANDING
Distil large amounts of complex data into an understandable form. Allows easier identification of patterns and insights.
HYPOTHESIS TESTING
Allows focus on the essential data to allow more accurate conclusions to be formed about whether hypotheses are supported or refuted.
EFFICIENT COMMUNICATION
Summarised data can be clearly and concisely presented for sharing in journals, conferences etc.
RESOURCE MANAGEMENT
Focussing on the most relevant data allows time, resources and funding to be focussed to ensure that research is impactful.
META-ANALYSES AND GENERALISATION
Summarised data is needed to combine results from multiple studies to increase statistical power and generalisability of findings.
DATA INTEGRITY AND REPRODUCIBILITY
Clearly documented summarised data allows researchers to replicate experiments.
DEVELOPMENT OF THEORIES
Summarised data allows identification of consistent patterns, allowing theoretical models to be developed and refined.
What are nominal data?
Categorical data without a specific order e.g., blood types.
What are ordinal data?
Categorical data with a meaningful order but no consistent interval between categories e.g., stages of cancer.
What are discrete data?
Countable values (typically integers) e.g., number of hospital visits.
What are continuous data?
Values can fall anywhere within a specified rang e.g., blood pressure.
What are interval data?
Numerical data with meaningful intervals between values but without a true zero point e.g., temperature.
What are ratio data?
Numerical data with equal intervals and a true zero point which allows for the calculation of ratios e.g., height, weight.
MEASURES OF CENTRAL TENDENCY
Mean
Median
Mode.
MEASURES OF SPREAD
Range
Variance (average of squared differences from the mean)
Standard deviation (square root of variance)
IQR (range between 25th and 75th percentile)
MEASURES OF SHAPE
Skewness (positive - tail on right, negative - tail on left)
Kurtosis (tailedness of data distribution)
GRAPHICAL SUMMARIES
Histograms (frequency distributions)
Box plots (median, quartiles, potential outliers)
Scatter plots (relationship between 2 quantitative variables)
Bar charts (categorical data)
SUMMARY TABLES
Frequency tables (number of occurrences in each category)
Contingency tables (frequency distribution of variables - shows relationship between them)
CORRELATION AND ASSOCIATION
Correlation Coefficient (measures strength and direction of relationship between 2 variables)
Covariance (indicates direction of linear relationship between 2 variables)
REGRESSION MODELS
Linear Regression (models relationship between dependent and independent variable(s) by fitting a linear equation to data - predicts value of dependent variable based on value(s) of independent variable(s))
Logistic Regression (models probability of event using binary outcome variables)
Poisson Regression (counts data and rates - for example the number of event occurrences within a fixed period)
LONGITUDINAL DATA ANALYSIS
Mixed-Effects Models (accounts for fixed and random effects - useful for measurements taken on the same subjects over time)
Generalised Estimating Equations (estimates parameters of a generalised linear model with a possible unknown correlation between outcomes)
SURVIVAL ANALYSIS
Kaplan-Meier Estimator (estimates survival function from lifetime data)
Cox Proportional Hazards Model (assesses effect of variables on survival time and estimates hazard ratios)
MULTIVARIATE ANALYSIS
Principle Component Analysis (reduces dimensionality of data whilst retaining most of the variance - identifies patterns and simplifies datasets)
Factor Analysis (identifies underlying relationships between variables by grouping them into factors)
BAYESIAN METHODS
Bayesian Inference (updates probability of a hypothesis as more evidence becomes available)
Markov Chain Monte Carlo (samples from a probability distribution to perform Bayesian inference)
ADVANCED VISUALISATION TECHNIQUES
Heatmaps (show intensity of data points)
Network Analysis (visualises relationships between entities)
What type of outlier falls outside inner fences?
A minor outlier.
What type of outlier falls outside outer fences?
A major outlier.
How do you calculate inner fence boundaries?
(Q3-Q1) x 1.5
Add to Q3
Subtract from Q1
How do you calculate outer fence boundaries?
(Q3-Q1) x 3
Add to Q3
Subtract from Q1
Why would you transform data?
To transform the data into a different scale to allow interpretation and/or statistical analysis.
What reasons are there for transforming data?
- To improve normality (to allow use of parametric tests)
- To reduce skewness
- To linearise the relationship between 2 variables
- To make multiplicative relationships additive
What are some commonly used data transformations?
- Natural logarithm transformations
- Power transformations
When would you perform a log transformation?
If the data are positive values and positively skewed - log transformations stretch the scale at the lower end and compress the scale at the upper end.