Introduction to Modelling Flashcards

1
Q

what is metadata

A

data about data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is data

A

unprocessed information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is information

A

data associated together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is knowledge

A

understanding information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what sort of software manages data

A

file formats for particular applications .xls .doc .mp4 .jpg

specialist data management applications eg covid tracker

group project last year

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

4 ways of adding structure to data files

A

delimited text field
fixed length field
length-based field
identified field

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is a delimeter text field

A

choosing a special character eg comma or question mark that will not appear as a legitimate character within the info field and this will separate the individual data entries eg. comma separated file csv

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a fixed length field

A

use a fixed length for each information field eg 20 characters, padding out when length is less than fixed lenght

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the disadvantage of delimeter text

A

the character cannot be used legitimately in the information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is length based field

A

writing the length of the information field before the information so we know exactly how much space it takes up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is identified field

A

write the name of the information field and then value both represented as delimited text fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are the two types of approaches of turning data into information

A

structured and unstructured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is a structured way of turning data into information

A

deliberately associate data together into information eg excel, data bases, datawarehouses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

examples of structured approaches of turning data into information

A

excel
databases
datawarehouses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is an unstructured way of turning data into information

A

loosely managed data together to serve a specific information need eg search engine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

example of an unstructured approach of turning data into information

A

search engines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

example of structured querying

A

SQL eg select exact criteria about data types

18
Q

What does SQL stand for

A

structured query language

19
Q

example of unstructured querying

A

keyword based, phrase based, search engine

20
Q

examples of structured results

A

exact, do need to estimate relevance, returns complete set of data that matches query criteria

21
Q

example of unstructured results

A

unsure of relevance, we must estimate relevance ourselves egGoogle page

22
Q

DBs stands for

A

databases

23
Q

ACID stands for

A

atomic
consistent
isolated
durable

24
Q

different type of data base models

A

relational
networked
hierarchal
onject-orientated

25
Q

most popular type of data base model

A

relational

26
Q

why didn’t the other types of data base models take off

A

most investments made in relational

27
Q

DWs stands for

A

datawarehouses

28
Q

what is a datawarehouse

A

a subject orientated, nonvolatile, time-varients collection of data in support of management decisions

29
Q

uses of datawarehouses

A

data mining
decision support
OLAP (online analytical processing)

30
Q

what do data warehouses allow

A

trend view of data as timestamped

31
Q

information retrieval cycle in unstructured approach

A
information needed
>>>
query
>>>
query indexing
>>>
refined query
>>>
matching/retrieval
>>>
recommended objects
>>>
user review
>>>
32
Q

what are four common challenges in managing data for enterprises and individuals

A

volume
validity
variety
velocity

33
Q

how is volume a challenge in managing data

A

getting bigger ad bigger

34
Q

what are legacy sytstems

A

old information systems, as technology changes, information about enterprises or customers etc stays constant so this information is stored in a legacy system

35
Q

how is velocity a challenge in managing data

A

often data is time sensitive so much be processed in real time as it is streaming in order to maximize its value

36
Q

how is variety a challenge in managing data

A

variety in data eg text, audio, video, click streams, log files and more, difficult to label it all and agree

37
Q

how is validity a challenge in managing data

A

trade offs between data privacy and protection

38
Q

solution for coping with variety challenges

A

natural language processing

semantic web technologies

39
Q

what is the difference between natural language processing and semantic web technologies

A

NLP is about understanding the context and matching based on that

semantics is more about labelling things right anf agreeing on the labels

40
Q

solution for coping with volume challenge

A

outsource information management and technologies eg cloud

41
Q

solution for coping with validity challenges

A

GDPR
lots of work going on about this at EU level

data ethics

data privacy

42
Q

what does GDPR stand for

A

general data protection regulations