Intro to Statistics Flashcards

Question

Characteristics of Inferential Statistics: Methods

Answer 1

Use probability theory to estimate population parameters, test hypotheses, and make predictions.

Answer 2

Utilize statistical models and algorithms to forcast future events.

Answer 3

Central Tendency: Mean, median, mode Dispersion: Range, variance, standard deviation, interquartile range Data Visualization: Charts, graphs, tables

Answer 4

Hypothesis Testing: T-tests, chi-square tests, ANOVA, regression analysis. Confidence Intervals: Range within which a population parameter is expected to lie. Sampling Methods: Random sampling, stratified sampling, etc.

Answer 5

Regression Analysis: Linear regression, logistic regression Machine Learning Models: Decision trees, neural networks, support vector machines Time Series Analysis: ARIMA, exponential smoothing.

Answer 6

In a study of patients' blood pressure readings, descriptive statistics might report the avverage (mean) blood pressure, the most common (mode) reading, and the range of readings.

Answer 7

In healthcare, predictive statistics might be used to predict the likelihood of a patient developing a certain disease based on their medical history and other risk factors.

Answer 8

In a clinical trial, inferential statistics might be used to determine wheter a new medication significantly lowers blood pressure compared to a placebo, using data from a sample of patients.

Answer 9

Also known as primary data, is data that has not been altered, cleaned, or processed after it has been collected. It is the original, unmodifed data gathered directly from a source. Summary: Unprocessed and original, requiring cleaning and organizing.

Answer 10

Data that has been cleaned, transformed, and organized to make it suitable for analysis. Involves steps like validation, sorting, aggregation, and normalization. Summary: Cleaned and transformed, ready for analysis.

Answer 11

Unprocessed: Data in it's original form, without any modifications. Detailed: Contains all the collected details, including potential errors and irrelevant information. Unorganized: Often unstructured and may need cleaning or organizing before analysis.

Answer 12

Responses from survey participants. Sensor readings from scientific instruments. Transaction logs from an online store.

Answer 13

Authentic: Provides a true representation of the collected information. Comprehensive: Contains all the nuances and details of the original data.

Answer 14

Cumbersome: Can be large and unwieldy, requiring significant preprocecssing. Error-Prone: Likely to contain errors, outliers, and irrelevant information.

Answer 15

Organized: Data is structured and ready for analysis. Cleaned: Errors and irrelevant information are removed. Transformed: Data may be aggregated, summarized, or conferted into a different format.

Answer 16

Average scores of survey responses. Monthly sales reports. Cleaned and formatted datasets used in machine learning models.

Answer 17

Usable: Ready for analysis, visualization, and interpretation. Accurate: Erros and irrelevant information have been removed.

Answer 18

Abstracted: Some details of the raw data may be lost in the processing. Dependent on Method: Quality and usefulness depend on the processing methods used.

Answer 19

Data collected directly from the source for a specific research purpose. It is original data gathered firsthand by the researcher. Summary: Collected firsthand for a specific research purpose, used by the researcher for new analysis.

Answer 20

Original: Collected directly from the source by the researcher. Specific Purpose: Gathered for a specific research question or objective. Control: Researcher has control over the data collection process.

Answer 21

Data collected from experiments or clinical trials. Surveys and questionnaires filled out by participants. Observations recorded by researchers in the field.

Answer 22

Relevant: Specifically colelcted for the reserach purpose. Current: Data is up-to-date and reflects current conditions.

Answer 23

Time-Consuming: Can take a lot of time to collect. Costly: Often requires significant resources to gather.

Answer 24

Data that was collected by someone else for a different purpose but is being used by a researcher for a new analysis. It is not collected firsthand by the current reseracher. Summary: Collected by someone else for a different purpose, used by the researcher for new analysis.

Answer 25

Pre-Existing: Already collected and available for use. Broad Purpose: Originally gathered for a different research question or objective. Accessibility: Researcher uses data collected by others

Answer 26

Data from government reports and censuses. Research articles and academic journals Historical records and archived data.

Answer 27

Time-Saving: Readily available, saving time on data collection. Cost-Effective: Often free or less expensive than collecting primary data.

Answer 28

Less Control: Researcher has no control over how the data was collected. Potentially Outdated: May not reflect the current conditions or context.

Answer 29

Is collected at a single point in time from multiple subjects or entities. This type of data provides a snapshot of a particular phenomenon at a specific moment. Summary: Snapshot of multiple subjects at a single time point.

Answer 30

Single Time Point: Data is collectged at one specific point in time. Multiple Subjects: Includes data from different entities, such as individuals, companies, or countries. Snapshot: Provides a snapshot view of the situation at the moment.

Answer 31

A survey conducted on a group of people to gather their opinions on a specific topic at a certain point in time. The demographic information of a population collected in a census. The financial statemetns of different companies for a particular fiscal year.

Answer 32

Quick to Collect: Since data is collected at one point in time, it is often faster to gather. Simple Analysis: Often simpler to analyze compared to time series or panel data.

Answer 33

Limited Insight: Does not provide information about changes over time. Snapshot View: Only provides a single view of the phenomenon, potentially missing temporal dynamics.

Answer 34

Data collected over time, capturing how a particular variable changes at different time points. This type of data helps in understanding trends, patterns, and forcasting future values. Summary: Data on a single subject collected at multiple subjects at a single time point.

Answer 35

Multiple Time Points: Data is collected at regular intervals over a period. Single Subject: Usually focuses on one subject or entity. Temporal Order: The order of data points is important as it represents changes over time.

Answer 36

Daily stock prices of a company over a year. Monthly unemployment rates over several years. Annual rainfall data for a specific region over decades.

Answer 37

Trend Analysis: Useful for identifying trends, cycles, and seasonal patterns. Forcasting: Can be used to make predictions about future values.

Answer 38

Complex Analysis: Requires more sophisticated statistical techniques. Time-Consuming: Collecting data over a long period can be time-consuming.

Answer 39

AKA longitudinal data, combines elements of both cross-sectional and time series data. It consists of multiple subjects measured repeatedly over time. Summary: Combines cross-sectional and time series data, tracking multiple subjects over multiple time points.

Answer 40

Multiple Subjects and Time Points: Data is collected from several subjects over multime time periods. Rich Information: Provides both cross-ssectional and temporal insights. Complex Structure: Each subject has its own time series of data points.

Answer 41

Annual income and expenditure data for a sample of households over several years. Health records of patients measured at regular intervals over timel. Employee performance metrics tracked quarterly over several years.

Answer 42

Comprehensive Analysis: Allows for the study of dynamics and casual relationships. Control for Variability: Can control for individual heterogenity, improving the robustness of statistical analysis.

Answer 43

Data Collection: Collecting panel data can be challenging and resource-intensive. Complexity: Analyzing panel data often requires advanced statistical methods and software.

Answer 44

Data that is organized and formatted in a way that makes it easily searchable and analyzable by traditional databases and tools. It follows a predefined schema and is usually stored in tabular form with rows and columns.

Answer 45

Format: Organized in a clear, predictable structure, often in tables. Schema: Follows a predefined schema or model. Ease of Access: Easily searchable and analyzable using traditional database systems like SQL

Answer 46

Relational Database: Data stored in tables within a database, such as customer information in a CRM system. Spreadsheets: Data organized in rows and columns in applications like Microsoft Excel or Google Sheets. Sensor Data: Time-stamped readings from IoT devices that are stored in a structured format.

Answer 47

Efficiency: Easy to enter, store, query, and analyze. Consistency: Structured format ensures data integrity and consistency. Automation: Can be processed using automated tools and algorithms.

Answer 48

Flexibility: Limited flexibility in terms of the types of data that can be stored. Scalability: May become complex and less efficient with very large datasets.

Answer 49

Data is information that doesn't have a predefined format or organization. It is more complex to analyze and search because it doesn't fit neatly into tables or rows or columns.

Answer 50

Format: Lacks a predefined structure; can come in various formats such as text, images, audio, and video. Schema: Does not follow a predefined schema or model. Complexity: Requires advanced tools and techniques for analysis and processing.

Answer 51

Text Documents: Word documents, PDF files, and text files. Multimedia: Images, videos, and audio recordings. Emails and Social Media: Messages, posts, tweets, and comments. Web Pages: HTML pages and web content.

Answer 52

Richness: Can capture a wide variety of information, providing richer insights. Flexibility: No need for a predefined structure, allowing for more diverse data types.

Answer 53

Analysis: More challenging to analyze and process; requires advanced techniques such as natural language processing (NLP) and machine learning. Storage: Often requires more storage space and sophisticated management.

Answer 54

The degree to which data serves its intended purpose. It refers to the degree of: - Timeliness - Accuracy - Completeness - Reliability - Consistency

Answer 55

- Quality of Measuring Devices, Questionnaires Used, and Approaches to Data Collection - Clarity of the Information Needed and Its Communication to Personnel Involved in Data Collection - Elimination of Outliers and Non-Representative Data - Use of Appropriate Formats - The Expertise of Individuals Involved in Data Collection - Willingness to Provide Data to Concerned Parties

Answer 56

- Allocate adequate time for data collection - Track and eliminate outliers - Train data collection personnel - Pretest and evaluate questionnaires

Intro to Statistics Flashcards

(80 cards)