IBM Data Science Professional Certificate Flashcards
Data Science
Data science uses math, statistics, programming, and tools like artificial intelligence (AI) and machine learning, along with subject matter (knowledge of a specific field), to find useful information in an organization’s data. This information helps guide decisions and plan strategies.
Process of uncovering insights from data
- Clarifying the problem
- Collecting the data
- Analyzing the data
- Recognizing patterns
- Storytelling based on the data
- Visualizing the data
What is an algorithm?
A set of step-by-step instructions to solve a problem or complete a task.
What is a model?
A representation of the relationships and patterns found in data to make predictions or analyze complex systems retaining essential elements needed for analysis.
What are outliers?
When a data point or points occur significantly outside of most of the other data in a data set, potentially indicating anomalies, errors, or unique phenomena that could impact statistical analysis or modeling.
What is quantitative analysis?
A systematic approach using mathematical and statistical analysis is used to interpret numerical data.
What is structured data?
Data is organized and formatted into a predictable schema, usually related tables with rows and columns.
What is unstructured data?
Unorganized data that lacks a predefined data model or organization making it harder to analyze using traditional methods. This data type often includes text, images, videos, and other content that doesn’t fit neatly into rows and columns like structured data.
Data file type
A computer file configuration that is designed to store data in a specific way.
Data format
How data is encoded so it can be stored within a data file type.
Data visualization
A visual way, such as a graph, of representing data in a readily understandable way makes it easier to see trends in the data.
Hadoop
An open-source framework designed to store and process large datasets across clusters of computers.
Jupyter notebooks
A computational environment that allows users to create and share documents containing code, equations, visualizations, and explanatory text. See Python notebooks.
Nearest neighbor
A machine learning algorithm that predicts a target variable based on its similarity to other values in the dataset.
Neural networks
A computational model used in deep learning that mimics the structure and functioning of the human brain’s neural pathways. It takes an input, processes it using previous learning, and produces an output.