Web mining Flashcards

1
Q

What 4 primary groups of data is of interest when web usage mining

A

usage data - Server log, navigational, HTTP requests
content data - combination of textural and image data
structure data- inter and intra likage structure
user data - profile info, demographics, cookies, past purchases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a server log file

A

The primary data source used in web usage mining, list of activities performed that is recorded by a server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In web usage mining what is the most basic level of abstraction (isolera viktig information)

A

Pageview - depending on the goals of the analysis, this data need to be transformed and aggregated at different levels of abstraction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is user activity record

A

A way to distinguish users without the need of identity. It is a way to see the sequence of activities per user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the name of the four analysis ways for web usage minining

A
  • classification & prediction of user
  • cluser analysis
  • sequential patterns
  • session and visitor analysis
  • association and correlation analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the parts of data preparation for web usage mining

A
  1. data collection: types of data usage data, content data, structure data and user data
  2. data fusion and cleaning: merge log files, remove crawler refrences and irrelevant data fields
  3. data segmentation: depending on the data need to be transformed and aggregated. Variables of interest for analysis is users and their behaviors (pageview, session, episodes)
  4. path completion: imputation of missing user reference due to cashing or proxy
  5. data integration of the set of user sessions or episode that are useful for a pattern discovery
  6. data modelling can be represented as transaction matrix (vanlig tabell) or enrichment representation nxr pageview-feature matrix..
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the parts of pattern discovery & analysis

A
  1. Session and visitor analys - basic statistics, most frequent accessed page
  2. Cluster analysis and visitor segmentation - Clustering groups together, pages or user clusters (personlized web content, demographics)
  3. Association and correlation analysis - items or pages accessed together, frequent itemsets
  4. Sequential and navigational patterns - consider time, and itemsets, techniques are used to create inter-session patterns for prediction of future visit pattern
  5. Classification and prediction of user transactions - by categorising items into bigger groups one can predict user behaviour “other people also purchased this”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is pageview identification (data segmentation)

A

The most basic level of abstraction, represents a specific user event. It is possible to identify based on knowledge of domain and page content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is user identification (data segmentation)

A

distinguish between different users by user activity recorded: the sequence of logged activities belonging to the same user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Session identification (data segmentation)

A

Sessionization, identification of a single visit to a site. It is the process of segmenting the activity record of each user into sessions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is episode identification (data segmentation)

A

A subset of the acitivites performed when in need of a certain information. “ subset of a session somprised of semantically related pageviews” like how many times did that user enter topic related domains.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly