Flash cards made by gpt based on summary
What are the four Vs of big data?
Volume (vast amounts of data), Variety (different types of data), Velocity (speed of data generation and movement), and Veracity (quality of data).
What is the difference between supervised and unsupervised learning in data analytics?
Supervised learning involves training a model on a labeled dataset, while unsupervised learning involves the model trying to understand and structure unlabeled data.
Describe the OSEMN steps in data science.
Obtain (extract, import, scrape), Scrub (clean and manage), Explore, Model (analyze), and Interpret (communicate).
What is the challenge in organizational data management?
The main challenge is making effective use of organizational data. Collecting data is not enough; it must be used effectively.
What are the features of organizational data management systems?
Features include a storage medium, a common structure for the dataset, an interface for rapid entry and retrieval. (NB acknowledging trade-offs in system design is also important)
What are the types of organizational data management systems?
Types include Transaction Processing System, Management Information System, Decision Support Systems, Business Intelligence, Online Analytical Processing, Data Mining, and Machine Learning.
What are desirable attributes of data in data management?
Data should be shareable, transportable, secure, accurate, timely, and relevant.
How do XML and HTML differ in data exchange?
XML is an extensible language without predefined tags, used for data exchange, while HTML focuses more on presentation and formatting with predefined tags.
What is JSON and its significance in data interchange?
JSON (JavaScript Object Notation) is a text-based, human-readable format used universally for web applications’ data interchange. It’s based on JavaScript object syntax.
Why is XBRL (Extensible Business Reporting Language) important in digital reporting?
XBRL standardizes digital reporting, making it more accurate and secure. It’s crucial for facilitating standardized information exchange.
What are the main components of an XBRL instance?
An XBRL instance includes values (text or numbers), context and variables (like entity, period, decimals, currency), concepts (business terms representation), and a dictionary (linking concepts to business terms).
How are data attributes stored in XBRL elements?
Attributes in XBRL elements are stored as <dimensions>, including label, ID, definition, type, period, balance, and reference.</dimensions>
What is the role of schema documents in XBRL?
Schema documents in XBRL describe the structure of an XBRL instance document, list the taxonomies used, and declare unique company-specific elements with attributes.
What are some common mistakes to avoid in data visualization?
Avoid having too much or too little information, inconsistency, ignoring the limits of human perception, misrepresenting data, using inappropriate data, and bad taste.
What are the different types of data in visualization?
Data types include items (individual entities), attributes (properties measured or observed), links (relationships between items), and positions (spatial data providing location).
What is the significance of data visualization in analyzing data?
Data visualization helps in finding relationships, discovering structure, quantifying values and influences, and effectively communicating data insights.
Explain the concept of ‘Anscombe’s quartet’ in data visualization.
Anscombe’s quartet demonstrates how lack of visualization can lead to misinterpretation of data, highlighting the effects of outliers and influential observations on statistical properties.
What is the CRAP principle in data visualization?
CRAP stands for Contrast, Repetition, Alignment, and Proximity, guiding the effective design and layout of visual data presentations.
How does the ‘Gestalt principle’ apply to data visualization?
The Gestalt principle suggests that our brain tends to organize visual elements into structured groups based on proximity, similarity, connection, and other factors.
What is Exploratory Data Analysis (EDA)?
EDA is a method used to analyze, investigate, and summarize data sets’ main characteristics, often using visualization methods.
Define Sampling Bias and its types.
Sampling bias is a systematic error in selecting participants for a sample. Types include self-selection, nonresponse, undercoverage, and survivorship biases.
What are the key elements of Descriptive Statistics?
Key elements include typical values (mean, median), variation (standard deviation), distribution (skewness, quantiles), abnormalities (outliers, missing values), and variable relationships (correlation).
How do mean and median differ, and when is each more appropriate?
Mean is the average, best for symmetric distributions without outliers. Median is the middle value, better for skewed distributions or with outliers.
Explain the concept of a Boxplot.
A Boxplot displays the distribution of data based on five summary statistics: minimum, first quartile (Q1), median, third quartile (Q3), and maximum.