11. Introduction to Big Data Techniques Flashcards
Alternative data
Data that are generated from non-traditional sources, such as social media and sensor networks.
Artificial intelligence (AI)
Computer systems that are capable of performing tasks that previously required human intelligence. AI methods are sometimes better suited to identify complex, non-linear relationships than are traditional quantitative and statistical methods.
Big data
The vast amount of information being generated by both traditional sources—for example, stock exchanges, companies, governments—and non-traditional sources—for example, electronic devices, social media, sensor networks, and company exhaust.
Data science
An interdisciplinary field that harnesses advances in computer science, statistics, and other disciplines for the purpose of extracting information from big data (or data in general).
Deep learning
An area of artificial intelligence in which a system uses neural networks to perform multistage, non-linear data processing to identify patterns. Also called deep learning nets.
Expert system
A type of computer programming, often based on “if–then” rules, that attempts to simulate the knowledge base and analytical abilities of human experts in specific problem-solving contexts.
Fintech
Technological innovation in the financial services industry, specifically with the design and delivery of financial services and products. It may also refer more broadly to companies involved in developing the new technologies and their applications, as well as the business sector that includes such companies.
Internet of things
The vast array of physical devices, home appliances, smart buildings, vehicles, and other items that are embedded with electronics, sensors, software, and network connections that enable the objects in the system to interact and share information.
Machine learning (ML)
Involves computer-based techniques that seek to extract knowledge from large amounts of data without making any assumptions about the data’s underlying probability distribution. The goal of ML algorithms is to automate decision-making processes by generalizing, or “learning,” from known examples to determine an underlying structure in the data.
Natural language processing (NLP)
A field of research within the field of text analytics and at the intersection of computer science, AI, and linguistics that focuses on developing computer programs to analyze and interpret human language.
Neural networks
A type of computer program design based on how the human brain learns and processes information.
Overfitting
When a machine learning model learns the input and target dataset too precisely, making the system more likely to discover false relationships or unsubstantiated patterns that will lead to prediction errors.
Scraping
An automated, large-scale, algorithm-driven approach that retrieves otherwise unstructured data available on websites and creates data in a more structured format.
Supervised learning
A type of machine learning in which the system attempts to learn to model relationships based on labeled training data.
Text analytics
Involves the use of computer programs to analyze and derive meaning typically from large, unstructured text- or voice-based datasets, such as company filings, written reports, quarterly earnings calls, social media, email, internet postings, and surveys.
Underfitted
When a machine learning model treats true parameters as if they are noise and is unable to recognize relationships in the training data, making the model more likely to fail to fully discover patterns that underlie the data.
Unsupervised learning
A type of machine learning in which the system tries to learn the structure of unlabeled data.