Data formats Flashcards

1
Q

What types of data are there?

A
  • Unstructured
  • Semi-structured
  • Structured
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the characteristics of structured data?

A
  • Easy to analyse, query and store

- Easy to clean, maintain consistency and security of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the characterictics of unstructured data?

A
  • Hard to index
  • Hard to organise
  • Lacks regularity and decomposable internal structure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some examples of semi-structured data?

A
  • CSV
  • HTML
  • XML
  • JSON
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the characteristics of CSV’s?

A
  • Stores tabular data
  • Just a delimited text file
  • Lacks format infomation
  • Contains no formulas or macros
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the characteristics of HTML?

A
  • Marked up with elements delineated by start and end tags
  • Elements correpsond to logical units
  • Tags are key words that are contained within pairs of pointed brackets
  • Browser determines how to display logical units
  • Not all elements need both a start and end tag
  • Elements have attributes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the limitations of HTML?

A
  • Designed for presentation purposes
  • Not converned with meaning just formatting
  • Not extensible
  • Inconsistently applied
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the characteristics of XML?

A
  • Meta markup language
  • User defined tags
  • Facilitate better encoding of semantics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are XML elements structured and what syntax is used?

A
  • One root element
  • Appropriate nesting of elements
  • Start and end tags
  • Attributes in quotes
  • Case sensitive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How must an XML document begin?

A

-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you comment in XML?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Are all chars available to use in XML?

A

Some characters have special meaning in XML however there are alternative ways to encode this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you insert large amounts of text in XML?

A

Using

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is JSON?

A

A data interchange format that that is built for lightweight data storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the advantages of JSON over XML?

A
  • JSON is more streamlined, lightweight and compressed
  • Is easier to parse generally
  • Used to read and display data from a webserver
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the advantages of XML?

A
  • Comes with a large family of other standards for querying and transforming
  • XML allows complex schema definitions
17
Q

What is ARFF?

A

Its a a data format made for Weka that has a header section (keys) and data (rows)

18
Q

Why is it hard to extract data from PDFs?

A
  • Limited consistency
  • Text stored as images
  • Contains different types of data
  • Data unstructured