Lecture 5 – Data sources Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Why sharing data?

A
  • working together on a project
  • common needs, common resource
  • data as a product
  • data-based service
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Opportunities from shared data

A
  • new combinations of data
  • new relationships of data
  • new visualisations of data
  • new understanding of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is open data?

A

data that is freely available to everyone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the problem with open data?

A

Its abundance and complexity

(many datasets, with different definitions and access points)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is machine-readable data?

A

data which is in a format that can be understood by a computer
(JSON, XML, csv)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a markup language?

A

system for annotating a document in a away that is synthetically distinguishable from the text (e.g. HTML, XML)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a digital container

A

file format whose specification describes how different elements of data coexist in a computer file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Metadata?

A

structured data that describes other data

can be …
descriptive (title, author, file size),
structural (relationships, chapters, elements of JSON),
administrative (version number, archiving data, createDate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Predictive Model Markup Language (PMML)?

A

provides a standard language for describing a predictive model that can be passed between analytic software

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Unix: pwd

A

path of current directory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Unix: cd DIRPATH

A

change directory to DIRPATH

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Unix: ls DIRPATH

A

output the filenames of DIRPATH

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Unix: cp FILENAME NEWFILENAME

A

copy FILENAME to NEWFILENAME

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Unix: mv FILENAME NEWFILENAME

A

rename FILENAME to NEWFILENAME

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Unix: echo “TEXT”

A

output TEXT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Unix: cat FILENAME

A

output content of FILENAME

17
Q

Unix: less FILENAME

A

output the content of FILENAME, but one screen at a time

Advantage: faster access because if file is large it doesn’t access the complete file, but accesses it page by page.

18
Q

Unix: wc FILENAME

A

count number of characters, term, lines in FILENAME

Lines , word, number of bytes for text file

19
Q

Unix: grep “PATTERN” FILENAME

A

output lines in FILENAME that match PATTERN

20
Q

Unix: head FILENAME

A

output the first lines of FILENAME

21
Q

Unix: tail FILENAME

A

output the last lines of FILENAME

22
Q

Unix: awk

A

process text files in various ways

23
Q

How can one standardize data?
(not the calculation but aspects around it that we can develop standards for)

A
  • access
  • format
  • value & vocabulary
  • metadata
  • software
  • process & workflow
24
Q

What role do data scientists play in standardizing data?

A
  • establishing the standards
  • enacting the standards