SnowFlake University - Snowflake Intro Flashcards
Databases with tables made up of rows and columns are called
RDBMS’s
Relational
Database
Management
Systems
Tables are situated within
Schemas
What are schemas?
Schemas are logical and organizational buckets/objects; a schema can have multiple tables and views.
Schemas are listed in parentheses after the table name.
Common RDBMS objects:
Tables
Views
Schemas
What are the 3 parts of Snowflake Worksheet?
- Navigational Tree Area (half of it turns to a properties panel when a table is chosen)
- SQL Entry Pane
- Result/Preview Pane
Warehouse provide…
the processing power to execute a command
To set a worksheet’s context using code, type
USE WAREHOUSE [warehouse name];
into the SQL Pane
Turning on/off the Code Highlight shortcut is….
and what does Code Highlight do?
ctrl + shift + k
It highlights the statement you click on, and only executes it.
Shortcut to running the whole SQL code is
ctrl + shift + Enter
Which editions of Snowflake has multi-cluster available?
Snowflake’s Enterprise Edition product (or above)
Sometimes a data file has values that contain quotes, apostrophes and/or commas.
These can cause problems when ingesting data. For that reason, some files will be…
formated so that fields that have string values are enclosed.
When creating a file format, where can you indicate what encloses some of the values that contain quotes, apostrophes and/or commas?
The option is called:
Field optionally enclosed by…
When creating a file format, and some of the values contain quotes, apostrophes and/or commas: what are the options to enclosing them?
- None
- Double Quote
- Single Quote
Snowflake doesn’t support other control (enclosing) character, so you must deal with them differently.
What is an ELT Process
Extract, Load, Transformation
What is considered part of the final step in the ELT Process?
Updating/changing a table( i.e. Transforming it)
What is an ETL Process
Extract, Transform, and Load
What are the names given to the two tables in an ETL process?
1st table (Extracted): Source 2nd table (Loaded): Target
What command to use in order LOAD a table?
INSERT INTO
Cleaning the data up where it already sits without moving it is called:
UPDATE in PLACE
What command to use in order EXTRACT a table?
FROM
What command to use in order TRANSFORM a table?
REPLACE
What does SQL stand for?
Structured Query Language
What SQL version Snowflake uses?
Standard SQL: ANSI 1999 and SQL: 2003 extensions
What does DML stand for and what does it allow us to do?
Data Management Language
- let users retrieve and edit data in databases
What does DDL stand for and what does it allow us to do?
Data Definition Language
- let users creat and edi database objects such as tables, indexes, and uses
Typical clauses for DML
SELECT, UPDATE, INSERT, DELETE
Typical clauses for DDL
CREATE, ALTER, DROP
What are the three layers of Snowflake architecture?
Cloud Services
Query Processing
Database Storage
What does Snowflake utilize to offer on-demand computing power?
MPP (Massively Parallel Processing) compute clusters where each node in the cluster store a portion of the entire data locally
Billing is based on
Credits (servers/cluster) consumed per second
Data loading efficiency does/doesn’t scale w/ WH size, while query efficiency does/doesn’t?
doesn’t
does
3 Layers in which data exist:
- Organizing/removing Data (DDL)
- Storing/working with Data (DML)
- Querying Data
What is a subquery?
A query within another query - could be used in statements such as SELECT, FROM, AND WHERE
Types of subqueries?
- Correlated
- Uncorrelated
- Scalar
- Non-Scalar
What does correlated subquery mean?
Refers to one or more columns from the outer query
What does uncorrelated subquery mean?
Has no reference to the outer query
What does scalar subquery mean?
returns a single value, it is not a correlated subquery, because it can be executed independently
What does non-scalar subquery mean?
Return 0 or more values (multiple rows and columns)
In the warehouse analogy what are the following called and what are they called in Snowflake?
Workers
Goods
Workers - Servers - Compute
Goods - Data - Storage
What problems are caused by separating data into data marts?
Getting integrated data (from another department or data mart) is inefficient
What is the temporary solution used to solve data marts issue?
Keeping copies of subsets of data in the various data mart– but that leads to numbers that does not match across departments
Snowflake section off and silos data and thus there is a need to replicate the data?
NO
How is Snowflake warehouse not the same as data marts?
Every wawrhouse has access to all you data, all the time
How many clusters of servers are there in a Snowflake warehouse?
1
When changing warehouse sizes from small to medium you are…
adding servers, not clusters
Scaling up/down means
Changing the size of a warehouse
Scaling out is sometimes called
Elastic data warehousing
Why is calling out sometimes called elastic data warehousing?
because the warehouse will stretch out to take extra work and then snap back to its original size once the work is done
A warehouse can be stretched to up to
10 clusters during peak workloads
Elastic warehouses are also called
Multi-cluster warehouses
What are the three things warehouse power is used for:
loading data
uploading data
querying data
What does the warehouse not do:
contain or store data
What is a Stage or Staging Area?
A place to put things temporarily before moving them to a more stable location
What are stages in Old School Data Warehouses?
A middle stop bw transactions (OLTP) and Reporting (OLAP)
What are stages in Snowflake?
Cloud folders or directories where you place files so that Snowflake services can pick them up and pull them up in to a database.
What is FTP?
File Transfer Protocol
What are the two types of stages in Snowflake?
Internal stages
external stages
What are inside stages in Snowflake?
they act like directories inside a Snowflake account’s local storage
What are external stages in Snowflake?
Act like a secure gateway between cloud storage services and Snowflake services
What are the big three cloud services?
Amazon S3
Google Cloud Platform (GCP)
Microsoft Azure BLOB Storage
What are the three needed components of an external stage?
- Cloud storage location
- Cloud storage access credentials (ex. IAM User & Policy for S3)
- Stage Definition (a stage object that contains references to those two things)
What does pre-stage data movement:
where data is coming from sources external to Snowflake (or cloud service)
What are the 4 applications provided by Amazon for pre-stage data movement?
- FTP tool called Amazon Transfer for SFTP
- Command-line Interface: Amazon CLI
- Website Interface
- Could also load data pragmatically (ex. Python)
T/F: Snowflake tracks whether a file has been loaded and doesn’t let you load it twice by accident.
True
What code to use if one wants to force a file to reload, which was already loaded?
FORCE = TURE
at the end of the COPY INTO statement
But this will result in having double the number of rows
How to query a data that doesn’t have column name?
We select them using a dollar sign and their sequence number
List data type trend levels from smaller to larger data quantity supported + examples
Structured Data - Tables
Semi-Structured Data
Quasi-Structured Data - Clickstream
Unstructured Data - Images, PDFs, Videos
Entity Relationship Diagrams (ERD)
They show relationship between different entities/tables, such as “authors” and “books” (logical infrastructure)
What are the following match to in a database:
Entity
Attribute
Value
Table
Column
Rows
Normalizing the data means (3 things):
- Different entities in separate tables
- Same info not repeated unnecessarily
- Unique IDs for each row
Creating a sequence to
act as a counter that give unique ID to each record
How to create a sequence
Through Wizard Web UI:
Go to Databases and click on Sequences in ribbon
What code line would show you the next value in a sequence?
SELECT sequence_name.nextval;
Every time you refer to the Sequence’s Next Value in a select statement, it will increment. And thus, it is recommended that you reset the counter before you use the value in a table
the 5 semi-structured data supported by Snowflake:
- JSON - JavaScript Object Notation
- XML
- Parquet
- Avro
- ORC
Name the data type used when setting up a table column that will be home to semi-structured data?
Variant
What is an Object Model?
It is the nesting pattern of any dataset
In JASON, attributes are called:
Keys
Key-Value Pair in JASON syntax is
A term used when referring to a key and its value
In JASON, every entity is surrounded by…
curly braces
The key and the value in a key-value pair are separated by…
a colon :
Between each key-value pairs there is a…
a comma
JASON syntax to retrieve a nested value
main_table:nested_table[sequence number of key].key
Sequence of keys start by 0, aka the first key in a nested table has the value of 0