Big Data Flashcards
What is FinTech?
Technology-driven innovation in finance industry
What were early forms of FinTech?
Data processing and automation of routine tasks
What are two important applications of FinTech in quantitative analysis?
Analysis of large (alternative datasets)
Analytical tools such as AI
What is meant by Big Data?
The vast amount of information being generated by the industry, government, individuals, and electronic devices. Includes data from traditional sources (stock exchanges, companies) as well as from non-traditional sources (social media, sensor networks)
What are the four characteristics of Big Data?
Volume (large amounts of data)
Velocity (high speed and frequency, real-time data)
Variety (many different sources)
Veracity (credibility, reliability)
What are sources of Big Data?
Financial markets
Businesses
Governments
Individuals
Sensors
Internet of Things
What is the difference between traditional business intelligence and Big Data?
Big Data incorporates the use of alternative data sources as well.
What are the three broad main sources of alternative data?
Individuals
Business processes
Sensors
What are challenges of Big Data?
Quality of data
Volume of data
Appropriateness of data
What is Artificial Intelligence?
Computer systems that are capable of performing tasks that traditionally have required human intelligence.
What are neural networks?
Programming based on how our brain learns and processes information
What is Machine Learning?
Computer-based techniques that seek to extract knowledge from large amounts of data without making any assumptions on the data’s underlying probability distribution.
What is the expert system?
Type of computer programming that attempted to simulate the knowledge base and analytical abilities of human experts in specific problem-solving context
What is the goal of Machine Learning?
Generate structure or predictions from data without any help from a human. Find the pattern, apply the pattern.
What are the three datasets involved in Machine Learning?
Training dataset: identify relationships between inputs and outputs.
Validation dataset: Validate relationships and tune the model.
Test dataset: Test the model’s ability to predict well on new data.
What is overfitting of the data in ML?
When the ML model learns the input and target dataset too precisely, the model has been “overtrained”. Treats noise in the data as true parameters.
What is underfitting of the data in ML?
Model is too simplistic, it treats true parameters as noise and cannot recognize relationships.
What are three classes of techniques of ML?
Supervised learning: model receives labeled data
Unsupervised learning: non-labeled data
Deep learning: use of neural networks
What is data science?
An interdisciplinary field that harnesses advances in computer science, statistics and other disciplines to extract information from Big Data.
What are the five data processing methods used by data scientists?
Capture: how data is collected into a format to be analyzed.
Curation: ensuring data quality and accuracy through data cleaning.
Storage: how data will be recorded, archived and accessed.
Search: how to query data.
Transfer: how to move data.
What is data visualization?
How the data will be formatted, displayed, and summarized in graphical form.
What are text analytics?
The use of computer programs to analyze and derive meaning typically from large, unstructured text- or voice-based datasets, such as filings or social media.
What is Natural Language Processing (NLP)?
Field of research at the intersection of computer science, AI and linguistics that focuses on developing computer programs to analyze and interpret human language.
Name five programming languages.
Python
R
Java
C
Excel VBA
Name three common databases.
SQL
SQLite
NoSQL