Sources of Data Flashcards
What are three Data Sources
Internal Data, Existing external data, Proprietary collection
What is Internal Data
Data from within the company/institution
What is Existing external data
From internal data of third parties, maybe free maybe not, collected and ready to use
What is Proprietary collection
Provided by third parties but requiring effort to go out and collect or process
Define API
Application Programming Interface
What are types of way data is collected
Some host datasets for download as hosted files (CVS, text files, directories of images etc), some may have an API which allows interacting with their information/services through a pre-defined functionality (these often cost to access), Web scraping (obtaining public facing information from websites), they may have restrictions on usage
What are the four dimensionality of a table
Scalar, Vector, Matrix, Tensor
Define Scalar
Singular numerical value, scalar in italics and lowercase
Define Vector
1D structure of values, vector in bold and lowercase
Define Matrix
2D Structure of values, matrix in bold and uppercase
Define Tensor
N-D structure of values, tensor in bold and uppercase
What is a formal way to describe these structures with their shape
e.g. 1x3, 3x2, 1x2x3, can combine this with a way to describe their content,
How do you represent categories of numbers
Use symbols to represent categories (Whole, Natural, Integer, Real, …) and use superscripts to describe size
Is NumPy “row-major” or “column-major”
NumPy is row-major
What does X[0,1] access in Numpy
It accesses the top right element, which is located in row 0 and column 1