Data Engineering Fundamentals Flashcards

1
Q

What is Avro?

A

A Binary storage format that keeps information about the schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Parquet?

A

Columnar storage optimized for analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does random sampling do?

A

It gives everything an equal chance at being selected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is stratified sampling?

A

It splits the population, but ensures representation of each subgroup.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is systemic sampling?

A

When you are going to select every N item.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is data skew?

A

Unequal distribution between partitions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What can be done to address data skew?

A

Adaptive partitionig

Salting

Repartitioning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the YEAR() function in SQL do?

A

It selects only the year from a date field.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does a pivot table do?

A

It makes row level data into columnar data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the default SQL join?

A

An inner join?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does inner join work?

A

It select all the rows from table A that have a matching identifier in table B.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does a left outer join work?

A

It selects everything in Table A regardless of whether there is a match in Table B. Only records with a match in Table B are returned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does a right outer join work?

A

It selects everything in Table B regardless of whether there is a match in Table A. Only records with a match in Table A are returned. Opposite of Left Join.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does a full outer join work?

A

Data from Table A and Table B is returned, but only matching records will have values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does Regex do?

A

It pattern matches.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the RegEx operator for case sensitivity?

A

~*

17
Q

What is the RegEx expression operator?

A

~

18
Q

What is the RegEx expression to not match?

A

!~*

19
Q

In GIT, how do I get files from the repository to my local workspace?

A

GIT Pull

20
Q

How would you initialize a new Git repository?

A

GIT Init

21
Q

What does GIT Config do?

A

Sets configuration values for user info and aliases.

22
Q

How do you clone or download a repository from an existing URL?

A

git clone

23
Q

What does git status do?

A

It checks the status of your changes in your working directory. This is local.

24
Q

How do you view commit logs in git?

A

Git log

25
Q

What does git branch do?

A

It shows all branches

26
Q

How would you create a new branch

A

git branch newBranchName

27
Q

How do you switch branches?

A

git checkout branchname

28
Q

How do you create a new branch and switch to it?

A

git checkout -b

29
Q

How do you delete a branch?

A

git branch -d

30
Q

How do you push your changes to the remote repository?

A

git push

31
Q

What does git pull do?

A

Pulls changes from a remote repository branch into the current local branch

32
Q

What is a transition action in s3?

A

It is used to move objects from one storage glass to another.

33
Q

What are expiration actions in S3?

A

They are used to configure object expiration / delete after N period of time.

34
Q

Can lifecycle rules be created based on tags or prefixes?

A

Yes, on Both

35
Q

What is the level hierarchy for S3?

A

Standard

Standard IA

Intelligent Tiering

One Zone IA

Glacier Instant Retrieval

Glacier Flexible Retrieval

Glacier Deep Archive

36
Q

What does S3 analytics do?

A

Helps you decide when to transition objects to the right storage class.

37
Q

What are the targets for S3 event notifications?

A

Lambda, SNS, and SQS