HBase Concepts Flashcards

1
Q

What is a node?

A

A single computer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a cluster?

A

A group of nodes connected and coordinated by certain nodes to perform tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Master Node?

A

A node performing coordination tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Slave Node?

A

A worker node performing tasks assigned to it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Daemon

A

A process or program that runs in the background

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Where is table data stored?

A

In HDFS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is HBase data stored in HDFS?

A

The data is split into HDFS blocks and stored on multiple nodes in the cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is an HBase table split into?

A

Regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What serves Regions to clients?

A

Region Servers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Can a RegionServer have regions for more than one table?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the HBase Master responsible for?

A

1 - Coordinates which regions are managed by each Region Server
2 - Handles new table creation and other housekeeping operations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Can an Hbase cluster have multiple Masters?

A

Yes, for high availability. But only one can be active at a time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What service handles the coordination of the Masters?

A

Zookeeper

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When a cluster has multiple Master, how is the active master determined?

A

Upon startup all Masters connect to Zookeeper. The first Master to connect, becomes the active master.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What happens if the controlling Master fails?

A

If you have additional master they will compete to run the cluster again.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What two servers are typically kept together in the slave nodes?

A

The data Node and RegionServer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

List 4 master nodes?

A

Name Node, Secondary Name Node, Master, Zookeeper

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are tables comprised of?

A

rows, columns and column families

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How are rows sorted?

A

They are sorted in rowkey order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Can columns in HBase be created on the fly?

A

Yes

21
Q

If a column for a row does not have a value, does it create a column for that row?

A

No

22
Q

What is a column family

A

A collection of columns

23
Q

What is the minimum number of column families that a table must have?

A

One

24
Q

What delimits the column family from the qualifier

A

A colon (:)

25
Q

Do all column family members have the same prefix?

A

Yes.

e.g. contactinfo:fname and contactinfo:lname

26
Q

Can you specify the tuning and storage settings at the column family level?

A

Yes

27
Q

Can you specify the tuning and storage settings at the column level?

A

No

28
Q

Is there a limit on the number of columns that a column family can have?

A

No

29
Q

How are columns stored within a column family

A

Columns within a family are sorted and stored together.

30
Q

Give two examples of when having separte column families are useful?

A

1 - Data that is not frequently accessed together

1 - Data that uses different column family options (such as compression)

31
Q

What data type is data in Hbase tables stored as?

A

A byte array

32
Q

Are empty cells stored?

A

No

33
Q

How are tables physically stored.

A

They are stored on a per-column family basis

34
Q

Is there a limit to the data that can be stored in Hbase?

A

It can store anything that can be serialized into a byte array.

35
Q

In Hbase, what is the equivalent of a primary key?

A

Rowkey

36
Q

What are the three Hbase Operations?

A
1 - Get
2 - Scan
3 - Put
4 - Delete
5 - Increment
37
Q

What does a Get do?

A

Retrieves a single row using the row key

38
Q

What does a scan do?

A

Retrieves all rows

39
Q

How can a scan be constrained?

A

By specifying a start and end key

40
Q

What does a Put do?

A

Puts a new row identified by a row key

41
Q

Can you do multiple puts at one time?

A

Yes

42
Q

What does a delete do?

A

Marks data as having been deleted. It removes the row identified by the row key.

43
Q

When you do a delete is the data removed from HDFS immediately?

A

No

44
Q

What is does an increment do?

A

Allows atomic counters. Allows the value to be initially set or incremented. It can be negatively incremented.

45
Q

What server is responsible for the counters’ consistency?

A

RegionServer

46
Q

How is a increment cell stored?

A

As a 64-bit integer (a long)

47
Q

What is the default number of versions kept by HBase

A

3

48
Q

How are versions stored?

A

They are stored by their timestamp in descending order

49
Q

What is responsible for serving table data

A

A region server