Unit 1 Flashcards
Algorithms
A set of step-by-step instructions to solve a problem or complete a task.
Model
A representation of the relationships and patterns found in data to make predictions or analyze complex systems, retaining essential elements needed for analysis.
Outliers
When a data point or points occur significantly outside of most of the other data in a data set.
Quantitative Analysis
A systematic approach using mathematical and statistical analysis is used to interpret numerical data.
Structured Data
Data is organized and formatted into a predictable schema, usually related tables with rows and columns.
Unstructured Data
This data type often includes text, images, videos, and other content that does not fit neatly into rows and columns.
CSV/TSV
Commonly used format for storing tabular data as plain text, where either the comma or the tab separates each value.
Data File Types
A computer file configuration is designed to store data in a specific way.
Data Format
How data is encoded so it can be stored within a data file type.
Data Visualization
A way of representing data in a readily understandable way makes it easier to see trends in the data.
Delimited Text File
A plain text file where a specific character separates the data values.
Extensible Markup Language (XML)
A language designed to structure, store, and enable data exchange between various technologies.
Hadoop
An open-source framework designed to store and process large datasets across clusters of computers.
JavaScript Object Notation (JSON)
A data format compatible with various programming languages for two applications to exchange structured data.
Jupyter Notebooks
A computational environment that allows users to create and share documents containing code, equations, visualizations, and explanatory text.
Nearest Neighbor
A machine learning algorithm that predicts a target variable based on its similarity to other values in the dataset.
Neural Networks
A computational model used in deep learning that mimics the structure and functioning of the human brain’s neural pathways. It takes an input, processes it using previous learning, and produces an output.
Pandas
An open-source Python library that provides tools for working with structured data is often used for data manipulation and analysis.
R
An open-source programming language used for statistical computing, data analysis, and data visualization.
Python Notebooks
Computational environment allows users to create and share documents containing code, equations, visualizations, and explanatory text.
Recommendation Engine
A computer program that analyzes user input, such as behaviors or preferences, and makes personalized recommendations based on that analysis.
Regression
A statistical model that shows a relationship between one or more predictor variables with a response variable.
Tabular Data
Data that is organized into rows and columns.
XLSX
The Microsoft Excel spreadsheet file format.
Open Source
Refers to any software whose source code is made available free for any third party to review and modify.