Chap 2 Flashcards

1
Q

What is data science

A

its a multi disciplinary field that uses scientific methods , processes , algorithms and systems to extract knowledge and insights from structured , semi-structured and unstructured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a data scientist ?

A

its a person engaging in a systematic activity to acquire knowledge from data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the role of data scientists ?

A

they perform research toward a more comprehensive understanding of products , systems or nature including physical , mathematical and social realms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the skillset of data scientists?

A

a strong background in
1. statistics and linear algebra
2. programming knowledge
3. data warehousing , mining and modeling to build and analyze algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is an algorithm ?

A

its a set of instructions designed to perform a specific task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is Data?

A

Data can be described as unprocessed facts and figures, it can exist in any form.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is information ?

A

its data that has been given meaning and is the processed data on which decisions and actions are based.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is Data processing

A

its the restructuring of data by people or machines to increase their usefulness and add value for a particular purpose.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are the basic steps of data processing ?

A
  • input
    -processing
    -output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are some material forms of data?

A

numbers
text
symbols
images
sound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are the 2 categories of data forms

A

qualitative =descriptions
quantitative =numeric records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what’s data type

A

its what informs the interpreter how the programmer intends to use the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are the different types of computer programming perspectives

A
  • integers
    -booleans
    characters
    strings
    float
    Astrings
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the 3 common types of data types:

A

structured
semi structured
unstructured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is structured data

A

its data that can be easily organized stored and transferred in a defined data model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is semi structured data

A

its a mix of unstructured and structured data

17
Q

what is unstructured data

A

information that either does not have predefined data model or is not organized in a pre defined manner

18
Q

what is meta data?

A

its data about data.
It provides additional information about a specific set of data

19
Q

what is Data value chain?

A

it describes the process of data creation and reuse

20
Q

whats data acquisition :

A

its the process of gathering filtering and cleaning data

21
Q

what is data analysis?

A

its the process of evaluating data using analytical and statistical tools to discover useful information
it involves :
exploring
transforming
modeling data

22
Q

what is data curation?

A

its the active management of data over its life cycle to ensure that it meets the necessary data quality requirements for its effective usage.

23
Q

what are the different activities of data curation processes ?

A

content creation
selection
classification
transformation
validation
preservation

24
Q

what is data storage

A

its the persistence and management of data

25
Q

what is data usage

A

it is the use of data for the required purpose

26
Q

what is big data

A

its a collection of data sets that are large and complex, and that is hard to process using management tools or data processing apps

27
Q

what 3 things is the definition of big data based on?

A

volume
velocity
variety

28
Q

what is resource pooling ?

A

combining the available storage space to hold data

29
Q

whats High availability

A

it prevents hardware or software failures from affecting access to data and processing

30
Q

whats easy scalability

A

it makes it easy to scale horizontally by adding additional machines to the group

31
Q

whats hadoop

A

its an open software that stores and processes large non-relational data

32
Q

what are the 4 characteristics of hadoop?

A
  1. economical
    2.reliable
    3.scalable
    4.flexible