Chapter 1: Data Analysis Flashcards
What is Data Analysis?
Data analysis is the process of collecting, transforming, cleaning, and interpreting data to make informed decisions or draw meaningful insights.
What is descriptive analysis?
Descriptive analysis is the step in data analysis where we summarize data to make it more understandable, often using averages, percentages, and trends.
What is inferential analysis?
Inferential analysis uses samples of data to make conclusions about a larger population, relying on techniques like parameter estimation and hypothesis testing.
What are some key techniques used in inferential analysis?
Key techniques include parameter estimation, hypothesis testing, and regression analysis.
What is predictive analytics?
Predictive analytics combines historical data and machine learning to build models that predict future events or outcomes.
What are the steps in the data analysis process?
The steps are:
- Define the objective, identify data requirements, collect data, process and format data, clean data, explore data, analyze data, model data, communicate results, and monitor and update.
What are the key considerations when collecting data?
Key considerations include whether the process is manual or automated, limitations of the data, validation at the source, and the accuracy of converting manual data to electronic form.
What is randomization in an experiment?
Randomization is the process of assigning participants or samples to different groups or conditions randomly to ensure each group is similar except for the treatment being tested.
How does randomization help reduce bias?
Randomization reduces bias by ensuring that each group is comparable at the start, which minimizes systematic errors that could skew results.
What are confounding variables?
Confounding variables are factors other than the one being studied that might influence the results. Randomization helps distribute these variables evenly across groups.
What is simple random sampling?
Simple random sampling involves randomly selecting participants from the entire population so that each person has an equal chance of being chosen.
What is stratified sampling?
Stratified sampling involves dividing the population into subgroups (strata) and then randomly sampling from each subgroup to ensure all subgroups are represented in the sample.
What are the types of data determined by the collection process, and how do they affect analysis?
- Cross-sectional data captures a snapshot at one time, useful for comparisons but not trends.
- Longitudinal data tracks changes over time, showing trends.
- Sensor data provides real-time measurements for monitoring.
- Truncated data occurs when observations are cut off or limited, which can lead to incomplete or biased results.
What are the key characteristics of big data?
Big data is characterized by its size (large amounts of data), speed (fast data generation and processing), variety (different types of data like text, images, and numbers), and reliability (ensuring data quality and accuracy).
What is replication?
Replication involves an independent third party conducting the same experiment or analysis as the original research and obtaining consistent or identical results to verify reliability.