Instructor's Method - 6/15/2021 Flashcards

1
Q

PolyBase

A

One technology to help load data from data lake to dedicated SQL pool

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Different distribution methods

A

Round Robin - data is (almost) equally divided amongst different distributions

Hash Distributed - data will be co-located based on Hash column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Round Robin

A

Don’t have to define/analyze in which distributions should I store the data

But reading data is slow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hash Distribution

A

Dedicated SQL pool will look into value of hash column, perform the hash, and then store it into distribution

Writing will be slower. Because Dedicated SQL pool has to apply logic (hash) to decide in which distribution to store

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Best Practice

A

Make sure you have at least 60 distinct values in hash column

Column that you choose should spread the data “as equally as possible”

Choose a column that will mostly be used in WHERE, JOINS, GROUP BY, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

CTAS

A

Create Table as SELECT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Concurrency slots

A

A set of resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Resource Classes

A

Defines # slots allocated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Resource Classes

A

Defines # slots allocated to use

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data warehousing unit

A

Decides # concurrency slots you are going to get

The performance level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Synapse Pipelines

A

Shares code base with Azure Data Factory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly