Chapter 1: Big Data Analysis and 4 Extraction Techniques Flashcards

1
Q

What is Big Data?

A
  1. Extremely large and diverse collections of structured, unstructured, and semi-structured data that continues to grow exponentially over time.
  2. Datasets are so huge and complex in volume, velocity, and variety, that traditional data management systems cannot store, process, and analyze them.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the most common Big Data Analysis Techniques?

A
  1. Association rule
  2. Classification tree analysis
  3. Genetic algorithms
  4. Machine Learning (ML)
  5. Regression analysis
  6. Sentiment analysis
  7. Social network analysis
  8. Data Mining
  9. Natural Language Processing (NLP)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define Association rule:

A
  1. Analysis technique adopted to find patterns in data through
    correlations between variables in large databases.
  2. First used by major supermarket chains to find patterns using the point-of-sale(POS) systems.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define Classification tree analysis:

A
  1. Type of machine learning algorithm that adopts a structural mapping of binary decisions which lead to a decision about the class of an object.
  2. Also known as Decision Tree.
    3.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define Genetic algorithms:

A
  1. Inpsired by inheritance, mutations, and natural selection.
  2. Used to develop effective solutions for problems that need optimization.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define Machine Learning (ML):

A
  1. Machine learning utilizes computer algorithms to generate insights from collected data, enabling predictions that would be beyond the reach of human analysts.
  2. It enables computers to learn autonomously without explicit programming, allowing them to make predictions based on patterns and insights gained from training data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define Regression analysis:

A
  1. Technique that examines the connections between two or more variables.
  2. It analyzes how one or more independent variables impact a dependent variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define Sentiment analysis:

A
  1. Also known as opinion mining
  2. Popular task in natural language processing (NLP).
  3. Process of classifying whether a block of text is positive, negative, or neutral.
  4. Its aim is to help businesses understand people’s opinions to support growth.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Social network analysis:

A
  1. Studies relationships and interactions within large networks.
  2. Looks at how people or entities are connected.
  3. Analyzes how information or influence spreads.
  4. Helps understand social structures and key influencers.
  5. Provides insights for marketing, communication, and strategy.
  6. Uses big data tools to manage and analyze complex data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define Data Mining:

A
  1. Data mining is the use of machine learning and statistical analysis to uncover patterns and other valuable information from large data sets.
  2. They can either describe the target data set or they can predict outcomes by using machine learning algorithms.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data Mining Techniques:

A
  1. Association rules
  2. Classification
  3. Clustering
  4. Decision tree
  5. K-nearest neighbor (KNN)
  6. Neural networks
  7. Predictive analytics
  8. Regression analysis
  9. DataMining
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define Natural Language Processing (NLP):

A
  1. Natural language processing
  2. Subfield of computer science and artificial intelligence (AI).
  3. Uses machine learning to enable computers to understand and communicate with human language.
  4. Combines computational linguistics (the rule-based modeling of human language) with statistical modeling, machine learning(ML), and deep learning.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Computational linguistics?

A

Discipline of linguistics that uses data science to analyze language and speech.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the two types of analysis in Computational linguistics?

A
  1. Syntactical analysis: determines the meaning of a word, phrase or sentence by parsing the syntax of the words and applying preprogrammed rules of grammar.
  2. Semantical analysis: uses the syntactic output to draw meaning from the words and interpret their meaning within the sentence structure.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define parsing:

A
  1. Dependency parsing: Looks at the relationships between words, such as identifying nouns and verbs.
  2. Constituency parsing: Builds a parse tree (or syntax tree): a rooted and ordered representation of the syntactic structure of the sentence or string of words.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does NLP work?

[NLP](https://www.ibm.com/topics/natural-language-processing

A

1.Combines computational linguistics (the rule-based modeling of human language) with statistical modeling, machine learning(ML), and deep learning.
2. Self-supervised learning (SSL).

17
Q

What are the three different approaches to NLP?

A
  1. Rules-based NLP.
  2. Statistical NLP.
  3. Deep learning NLP.