Information Systems Flashcards

1
Q

What is an Algorithm?

A

A finite set of rules that gives a sequence of operations for solving a specific type of problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a Computer Program?

A

An instance or concrete representation, fro an algorithm in some programming language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name 4 features of algorithms

A

they are finite, definite, have 0 or more inputs, 1 or more outputs and are effective

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Data

A

Raw facts eg alphanumeric data, image data etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is information

A

Data with some meaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Name 7 aspects of data and information

A
  1. Storing and Processing data
  2. Encrypting and security of data
  3. Information theory and communication theory.
  4. Value of Information
  5. Frequency
  6. Linguistic theories
  7. Human cognition
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a web search engine?

A

An online web information retrieval system that, given a query, which represents a users information need, returns a list of web pages that match that query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Name the 3 types of data

A

Structured, unstructured and semi-structured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define Structured Data

A

Data that resides in a fixed field within a record or file eg often relational (or other) database approach.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define Unstructured Data

A

Data that isn’t organised in any obviously meaningful way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define semi-structured Data

A

Data that doesn’t have a formal structure but does have tags or other information that convey meaning of data, eg XML or RDF documents with headings/sections, emails etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the most used data type today

A

unstructured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Organic Content

A

Unpaid marketing content that potential and existing customers can find naturally.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is sponsored content

A

Ads with words matching the query words that are ranked above the web documents returned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is ranking

A

It involves ordering results returned in response to a user query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Discuss the web link distribution

A

Web page links are not randomly distributed. Distribution is widely reported to be a power law, in which the total number of web pages with in-degree i is proportional to 1/i^c (c a constant).
ie only a small portion of web pages have a huge number of linksW

17
Q

What does an index do?

A

Associates a web page with one or more terms

18
Q

Explain pre-processing

A
  1. Case folding (words are changed to lowercase)
  2. Punctuation is removed
  3. “stop words” are removed
  4. “Stemming” is performed
19
Q

What are stop words and why are they removed?

A

Words that do not provide any extra information about the meaning of the document. They are removed in order to save storage space and speed up searches.

20
Q

What is “stemming”?

A

Tries to find the “stem” of each word. A stem represents variant forms of a word which share a common meaning. eg consist, consisted and consisting have the same stem “consist”.

21
Q

Describe Lemmatisation.

A

A lemma is a base form of a word and it is what we look up in a dictionary. i.e. walking -> walk. Lemmatisation is the conversion of a word to its lemma. It is harder than finding its stem

22
Q

What is tf and idf

A

tf is the term frequency i.e. how often a term occurs in a document
idf is the inverse document frequency which shows is the term occurs often across all document which are being searched

23
Q

How do you calculate the tf-idf and what is it

A

It is a representation of a real numbe that represents the weights such that the higher the weight the more important the term is in describing the meaning of the document.
The tf is calculated as follows:
no. times term t occurs/ no. terms in a document.
the tf-idf is then calculated:
tfx Logˇ10(N/c + 1)
Where N is the no. documents and c is the no. documents the term occurs in