Pig Scripts Flashcards

1
Q

How to filter

A

FILTER dataset BY expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

generate min/max value of column

A

FOREACH (GROUP dataset ALL) GENERATE MAX(dataset.column) AS var_name;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Calculate difference between min max

A

max_project = FOREACH (GROUP grades ALL) GENERATE MAX(grades.project) AS max_project;

– Calculate the worst ‘project’ value
min_project_high_exam = FOREACH (GROUP high_exam_students ALL) GENERATE MIN(high_exam_students.project) AS min_project_high_exam;

– Calculate the difference
difference = FOREACH max_project GENERATE max_project - min_project_high_exam AS project_difference;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Storing results for further processing

A

STORE var_name INTO table USING PigStorage(‘,’);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Load dataset with schema

A

X = LOAD ‘table’ USING PigStorage(‘,’) AS (ID:int, ca1:int);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Key functions {average, concatenate, count, difference, max/min, sum}

A

AVG
CONCAT
COUNT
DIFF
MAX/MIN
SUM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

For Loop

A

FOREACH X GENERATE var_type{tuple, bag}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to group

A

GROUP dataset BY dataset.column;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When to define the schema

A

When loading the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to sort

A

ORDER dataset BY dataset.column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly