IBM Data Science Professional Certificate Flashcards

1
Q

Data Science

A

Data science uses math, statistics, programming, and tools like artificial intelligence (AI) and machine learning, along with subject matter (knowledge of a specific field), to find useful information in an organization’s data. This information helps guide decisions and plan strategies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Process of uncovering insights from data

A
  • Clarifying the problem
  • Collecting the data
  • Analyzing the data
  • Recognizing patterns
  • Storytelling based on the data
  • Visualizing the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an algorithm?

A

A set of step-by-step instructions to solve a problem or complete a task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a model?

A

A representation of the relationships and patterns found in data to make predictions or analyze complex systems retaining essential elements needed for analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are outliers?

A

When a data point or points occur significantly outside of most of the other data in a data set, potentially indicating anomalies, errors, or unique phenomena that could impact statistical analysis or modeling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is quantitative analysis?

A

A systematic approach using mathematical and statistical analysis is used to interpret numerical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is structured data?

A

Data is organized and formatted into a predictable schema, usually related tables with rows and columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is unstructured data?

A

Unorganized data that lacks a predefined data model or organization making it harder to analyze using traditional methods. This data type often includes text, images, videos, and other content that doesn’t fit neatly into rows and columns like structured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data file type

A

A computer file configuration that is designed to store data in a specific way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data format

A

How data is encoded so it can be stored within a data file type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data visualization

A

A visual way, such as a graph, of representing data in a readily understandable way makes it easier to see trends in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Hadoop

A

An open-source framework designed to store and process large datasets across clusters of computers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Jupyter notebooks

A

A computational environment that allows users to create and share documents containing code, equations, visualizations, and explanatory text. See Python notebooks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Nearest neighbor

A

A machine learning algorithm that predicts a target variable based on its similarity to other values in the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Neural networks

A

A computational model used in deep learning that mimics the structure and functioning of the human brain’s neural pathways. It takes an input, processes it using previous learning, and produces an output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Pandas

A

An open-source Python library that provides tools for working with structured data, often used for data manipulation and analysis.

17
Q

Python notebooks

A

Also known as a “Jupyter” notebook, this computational environment allows users to create and share documents containing code, equations, visualizations, and explanatory text.

18
Q

R

A

An open-source programming language used for statistical computing, data analysis, and data visualization.

19
Q

Recommendation engine

A

A computer program that analyzes user input, such as behaviors or preferences, and makes personalized recommendations based on that analysis.

20
Q

Regression

A

A statistical model that shows a relationship between one or more predictor variables with a response variable.

21
Q

Tabular data

A

Data that is organized into rows and columns.

22
Q

Delimited Text Files. Benefits and examples

A

They are plain text files where each line represents a record and values within a line are separated by a delimiter (for example a comma).
Delimited files are versatile, allowing field values of any length, and can be processed by nearly all applications.
CSV (Comma-Separated Values) and TSV (Tab-Separated Values) are the most common types.

23
Q

Extensible Markup Language (XML)

A

A markup language for encoding documents in a format that is both human-readable and machine-readable.
Unlike HTML, XML uses custom tags and is designed for data sharing across different systems, making it platform and programming language independent.

24
Q

What is Microsoft Excel Open XML Spreadsheet (XLSX)?

A
  • An XML-based file format for spreadsheets, consisting of multiple worksheets with rows and columns.
  • Each cell in a worksheet can contain data. XLSX supports all Excel functions, is accessible by most applications, and is secure as it cannot contain malicious code.
25
Q

What is a Portable Document Format (PDF)?

A
  • PDFs present documents in a consistent format across various software, hardware, and operating systems.
  • This format is widely used for legal and financial documents and forms due to its universal compatibility and security features.
26
Q

What is JavaScript Object Notation (JSON)?

A
  • A text-based format designed for transmitting structured data over the web.
  • JSON is easy to read and write, works across many browsers, and is language-independent, making it ideal for data interchange, including APIs and web services that handle complex data types like audio and video.
27
Q

What is a Data Scientist?

A

Someone who solves problems by analyzing data (big or small) using appropriate tools and communicates findings effectively to stakeholders.

28
Q

What’s the difference between Data Science and Statistics?

A

In general, statistics is the study of numerical or quantitative data to make predictions or draw conclusions about a population. Data science is an applied subset of statistics that uses statistical methods to analyze large amounts of data and understand the results better.

29
Q

What is Digital Transformation?

A

The integration of digital technology into all areas of a business, fundamentally altering how operations are conducted, and value is delivered. It is driven by Big Data analysis.

30
Q

What is cloud computing?

A

Cloud computing provides computing services on-demand via the Internet, offering a cost-effective way to access applications, storage, and more, without installing them locally.

31
Q

What are the three deployment models of cloud computing?

A

Public, private, and hybrid.

32
Q

What are the three service models of cloud computing?

A

Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS).

33
Q

What are the main features of cloud computing?

A

On-demand self-service, wide network access, resource pooling, rapid elasticity, and measured service.

34
Q

How does cloud computing change how businesses and individuals utilize technology?

A

It allows for scalable resources and applications based on a pay-as-you-go model.