LM 11: Introduction to Big Data Techniques Flashcards
What is fintech?
refers to technological innovation in the design and delivery of financial services and products
What are the 4 characteristics or V’s of big data?
- volume
- velocity
- variety
- veracity (accuracy)
What are the 6 main sources of big data? FBGISI
- financial markets
- businesses
- governments
- individuals
- sensors
- internet of things
Who are the 3 main sources of alternative data generation? IBS
- individuals
- business processes
- sensors
What are 3 ways data can be organized? SSU
- structured
- semi-structured
- unstructured
What are the 3 challenges to big data? QVA
- quality
- volume
- appropriateness
What is artificial intelligence?
enables computers to perform tasks that traditionally have required human intelligence
What is machine learning (ML)?
computer programs that learn how to complete tasks; improving with time as more data have become available
What are the 3 types of machine learning? SUD
- supervised learning (given inputs and outputs and tries to figure out the best model training data)
- unsupervised learning (algorithm seeks to describe data and find patterns)
- deep learning (utilizes neural networks to identify patterns, use them in image & speech recognition)
What is the difference between underfit and overfit machine learning?
underfit: failure to recognize true relationships in a training data set
overfit: model generates very high accurate relationships
What is data science?
uses computer science and statistics to extract information from big data
Describe the 5 data processing methods data scientists use. DDDST
- data capture (how data is collected & transformed into a usable format)
- data curation (cleaning data to ensure high quality)
- data storage (recording, archiving, & accessing data)
- search (locating specific information in large datasets)
- transfer (moving data from their source or storage location to the analytical tool)
What is data visualization?
how data will be displayed & summarized in graphical form
What is the difference between text analytics and natural language processing?
text analytics uses computer programs to analyze unstructured text or voice-based datasets
natural language processing (NLP) uses text analytics focus on interpreting human language
What is corporate exhaust?
refers to the trail of data left by business activities and transactions. Examples include supply chain information, banking transactions, and point-of-sales data.