What Data Scientists Do Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What example did Dr. Murtaza Haider investigate to demonstrate the role of a data scientist?

A

Dr. Haider found a relationship between unexpected bad weather and the number of public transit complaints in Toronto.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can data scientists help tackle environmental challenges like water toxicity?

A

By using artificial neural networks, data scientists can help predict algae blooms and safeguard ecosystems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What did Norman White build that simplified intricate problems across departments?

A

He built a recommendation engine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What educational tools does Dr. White use to teach future data scientists?

A

Python notebooks, Unix, Linux, relational databases, and tools like Pandas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What educational backgrounds does Dr. Vincent Granville list as necessary for a data scientist?

A

Algebra, calculus, training in probability, and statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the difference between a statistician and a data scientist according to Dr. Granville?

A

A data scientist uses statistics, but is not only a statistician.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is statistical regression used for?

A

To show the probable relationship between two variables, such as distance driven and gas used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What machine learning algorithm is mentioned in the text for processing big data?

A

Nearest neighbor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why should the term ‘big data’ be used with caution?

A

Because what was once considered big data is constantly evolving due to innovation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What tools have expanded the possibilities for handling big data?

A

Tools like Hadoop and software advancements have expanded the limits for handling data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What sets a data scientist apart according to Dr. Patel?

A

Their ability to unlock insights and convey compelling narratives to stakeholders.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What types of data do data scientists work with?

A

Data from a wide variety of sources, including video, audio, and text (structured and unstructured).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are some common data formats used by data scientists?

A

Delimited text files, spreadsheets, XML, PDFs, and JSON.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What quality does Rachel Schutt highlight as making a data scientist exceptional?

A

Curiosity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What skills and roles does a data scientist combine, according to Rachel Schutt?

A

A blend of computer scientist, software engineer, and statistician.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What defines a data scientist’s prowess according to Rachel Schutt?

A

Their ability to transform unstructured solutions into structured insights.

17
Q

What are Comma-separated values (CSV) / Tab-separated values (TSV)?

A

Commonly used format for storing tabular data as plain text where either the comma or the tab separates each value.

18
Q

What are data file types?

A

A computer file configuration designed to store data in a specific way.

19
Q

What is a data format?

A

How data is encoded so it can be stored within a data file type.

20
Q

What is data visualization?

A

A visual way, such as a graph, of representing data in a readily understandable way makes it easier to see trends in the data.

21
Q

What is a delimited text file?

A

A plain text file where a specific character separates the data values.

22
Q

What is Extensible Markup Language (XML)?

A

A language designed to structure, store, and enable data exchange between various technologies.

23
Q

What is Hadoop?

A

An open-source framework designed to store and process large datasets across clusters of computers.

24
Q

What is JavaScript Object Notation (JSON)?

A

A data format compatible with various programming languages for two applications to exchange structured data.

25
Q

What are Jupyter notebooks?

A

A computational environment that allows users to create and share documents containing code, equations, visualizations, and explanatory text.

26
Q

What is nearest neighbor in machine learning?

A

A machine learning algorithm that predicts a target variable based on its similarity to other values in the dataset.

27
Q

What are neural networks?

A

A computational model used in deep learning that mimics the structure and functioning of the human brain’s neural pathways.

28
Q

What is Pandas?

A

An open-source Python library that provides tools for working with structured data, often used for data manipulation and analysis.