Chapter 1: Big Data Analysis and 4 Extraction Techniques Flashcards
1
Q
What is Big Data?
A
- Extremely large and diverse collections of structured, unstructured, and semi-structured data that continues to grow exponentially over time.
- Datasets are so huge and complex in volume, velocity, and variety, that traditional data management systems cannot store, process, and analyze them.
2
Q
What are the most common Big Data Analysis Techniques?
A
- Association rule
- Classification tree analysis
- Genetic algorithms
- Machine Learning (ML)
- Regression analysis
- Sentiment analysis
- Social network analysis
- Data Mining
- Natural Language Processing (NLP)
3
Q
Define Association rule:
A
- Analysis technique adopted to find patterns in data through
correlations between variables in large databases. - First used by major supermarket chains to find patterns using the point-of-sale(POS) systems.
4
Q
Define Classification tree analysis:
A
- Type of machine learning algorithm that adopts a structural mapping of binary decisions which lead to a decision about the class of an object.
- Also known as Decision Tree.
3.
5
Q
Define Genetic algorithms:
A
- Inpsired by inheritance, mutations, and natural selection.
- Used to develop effective solutions for problems that need optimization.
6
Q
Define Machine Learning (ML):
A
- Machine learning utilizes computer algorithms to generate insights from collected data, enabling predictions that would be beyond the reach of human analysts.
- It enables computers to learn autonomously without explicit programming, allowing them to make predictions based on patterns and insights gained from training data.
7
Q
Define Regression analysis:
A
- Technique that examines the connections between two or more variables.
- It analyzes how one or more independent variables impact a dependent variable
8
Q
Define Sentiment analysis:
A
- Also known as opinion mining
- Popular task in natural language processing (NLP).
- Process of classifying whether a block of text is positive, negative, or neutral.
- Its aim is to help businesses understand people’s opinions to support growth.
9
Q
Social network analysis:
A
- Studies relationships and interactions within large networks.
- Looks at how people or entities are connected.
- Analyzes how information or influence spreads.
- Helps understand social structures and key influencers.
- Provides insights for marketing, communication, and strategy.
- Uses big data tools to manage and analyze complex data.
10
Q
Define Data Mining:
A
- Data mining is the use of machine learning and statistical analysis to uncover patterns and other valuable information from large data sets.
- They can either describe the target data set or they can predict outcomes by using machine learning algorithms.
11
Q
Data Mining Techniques:
A
- Association rules
- Classification
- Clustering
- Decision tree
- K-nearest neighbor (KNN)
- Neural networks
- Predictive analytics
- Regression analysis
- DataMining
12
Q
Define Natural Language Processing (NLP):
A
- Natural language processing
- Subfield of computer science and artificial intelligence (AI).
- Uses machine learning to enable computers to understand and communicate with human language.
- Combines computational linguistics (the rule-based modeling of human language) with statistical modeling, machine learning(ML), and deep learning.
13
Q
What is Computational linguistics?
A
Discipline of linguistics that uses data science to analyze language and speech.
14
Q
What are the two types of analysis in Computational linguistics?
A
- Syntactical analysis: determines the meaning of a word, phrase or sentence by parsing the syntax of the words and applying preprogrammed rules of grammar.
- Semantical analysis: uses the syntactic output to draw meaning from the words and interpret their meaning within the sentence structure.
15
Q
Define parsing:
A
- Dependency parsing: Looks at the relationships between words, such as identifying nouns and verbs.
- Constituency parsing: Builds a parse tree (or syntax tree): a rooted and ordered representation of the syntactic structure of the sentence or string of words.