snowflake Trial Edition Flashcards
What are the main pushbuttons in Snowflake main screen
Projects
Data
Data products
AI & ML
Monitoring
Admin
What are the options under projects
Worksheets
Streamlit
Dashboards
App packages
What is VPS
Virtual private Snowflake
What are the main cloud hosting snowflake
AWS, Azure and GCP
Each snowflake account is hosted in a single region
True
What are the unique layers of Snowflake
Storage layer
Compute layer
Cloud service layer
What is share disc architecture
one data base with several services
What is share nothing architecture
several dbs in paralel with many services
What are the characteristics of Data Storage layer
Underlying cloud layer
Virtually infinite storage
Compressed & Encrypted
Cloud Data Redundancy
pay only for stored data
What are the characteristics of Compute & Processing Layer
Query engine or virtual warehouse
Underlying virtual machines
Scale up and down as needed
different size to serve different loads
What are the characteristics of Cloud Service Layer (Brain of snowflake)
Authentication and Authorisation
User and session management
Query compilation, Optimization & Data Caching
virtual Warehouse Management
Metadata management
Can each layer be scaled independently
True
What does snowflake do to the loaded data
It compresses and store data in columnar form.
What can we do in the compute layer
Select queries
join queries
data loading
What is VWH
Virtual warehouse and before any query is executed they need to be provisioned.
Which layer takes care of the authentication 7 authorisation
Cloud service layer for all incoming request from:
Web UI
SnowSQL
JDBC
ODBC
what is another name of Query in snowflake
Work Load
What attributes makes Snowflake a true SAAS Solution
No hardware to purchase or configured
No maintenance upgrades or patches
Transparent releases don’t required user intervention.
Which instalment options are available
AWS, Azure
Three terms are used to described the same compute layer
Compute Layer
Virtual Warehouse Layer
Query processing layer
Two terms are used to described the same Cloud Services Layer
Cloud Services Layer
Virtual Warehouse Layer
Two of these terms are used to described the same data layer
Data layer
Storage layer
Which tasks are performed by the Cloud Services Layer
Metadata management
User authentication
Metadata storage
Data security
What is DML
Data Manipulation language
What is DDL
Data Definition Language
What are the method types for loading data
Bulk loading and continuous loading
what the stages types
External and Internal
produce a syntax of an inner join
select A.C_CUSTOMER_SK, A.C_CUSTOMER_ID, A.C_CURRENT_CDEMO_SK, B.CA_STREET_NUMBER, B.CA_STREET_NAME from SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL_OLD.CUSTOMER as A
inner join SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL_OLD.CUSTOMER_ADDRESS as B
on A.C_CUSTOMER_SK = B.CA_ADDRESS_SK;
what did you get as soon as Database is created in snowflake
Information Schema and Public
What are the options of the submenu for creating a table
Standard,
As select
External
From file
What is the qualification when creating tables
DB.Schema.Table name
What is the acceptable syntax for naming surrogate key
_PK, _FK
What are the benefits of surrogate keys
performance improvement
handle dummy values
integrate multiple systems
easier admin and update
useful when there are no natural keys available
What is type 0
Retain original
What is SCD type 1
The values are overwritten and no history is maintain.
What is SCD type 2
We maintain a complete history of changes. Every time, there is a change a new row will be added to the table. A new column is added to the table to indicate witch record is active. This column(s) could be a flag (true or false) or version number (0,1,2, etc), or time columns to indicate from/to.
What is SCD type 3
A partial history is maintained and not a complete history. an extra row is added to indicate the previous state.
What is SCD type 4
The dim table changes frequently. There is a need for creating a new table or redesign existing model.
What is SCD type 6
This is a combination of 1,2,3. Along with the addition of a new row, we also update the latest value in all the rows.
What are the stages types
External and Internal
Where is the data flowing to Snowflake to form External Stage
S3, External cloud Provider, Google Cloud, Microsoft Azure
What is the copy syntax
copy into <table name>
from externalStage
files = (‘file_name1’,’file_name2’)
file_format = <file_format_name>
copyoptions</file_format_name>
Where can I find Snowflake documentation
docs.snowflake.com
What is the syntax for creating a schema
create or replace schema qualified address + name
What is the syntax for creating a file format
create or replace file format qualified address + name
How to determine a file definition
by using desc file format qualified address + name
What is the syntax for changing a file properties
alter file format qualified address + name
for example SKIP_HEADER = 1
Alter cannot be used for changing properties that are not inherently to its native format.
True
What is the syntax for setting up csv file properties
create or replace file format qualified address + name
TYPE = CSV,
FIELD_DELIMITER = “,”
SKIP_HEADER = 1;
What is the syntax for loading data into a table
copy into DB.Schema.Table name
from @qualified name
file_format = (format_name = qualified name + csv file format)
files = (‘file name.csv’)
What does the truncate command does
Delete the content of a table
How to get rid of double quotes in Json file - column
:: String
What command is used to select one row from a Json file
raw_file:
How to use raw_file for extracting a Json object
raw_file object name.first property, raw_file object name.second property,
What is the syntax for creating a role
create role <data_scientist>
grant usage on <warehouse> to role <data_scientist></data_scientist></warehouse></data_scientist>
How can you delete an user
drop user <me></me>
How long does the cash storage last
24 hrs or an soon the data is changed
Is clustering for all tables
False
What are the most suitable field for clustering
Date or those fields used in the where clause or in joins
What is the syntax for creating clusters in tables
create table name ….. cluster by (column1, column2 …column) or expression <>
What is the syntax for changing clusters in tables
alter < table name> cluster by (expression 1, … expression n)
alter <table name> drop clustering key
What is the process for building Snowpipe from S3
1.Create Stage
2. Test copy command
3. Create pipe
4. S3 notification
What is Snowpipe
It enables loading once a new data is storing in a third party source bucket.
What happens when auto_infest = True
Data will be push from source system bucket automatically to Snowflake.
What is the syntax for creating a pipe
create or replace pipe <schema><pipe>
auto_ingest=true
as
copy into our_first
from @manage_DB_external</pipe></schema>
Mentioned the time travel type
Standard
Enterprise
Business Critical
Virtual private
What is the retention period - default
Data can be recovered up to 1 day only
Who can perform the fail safe process
Restoring only by Snowflake support
How many methods are there to recover time travel data
2
Can undrop work schemas and tables
True
What are the table types
Permanent
Transient
Temporary
what does the field kind contain
Table type
Can the Transient schema pass on the same properties to created tables under this schema
True
is cloned object independent from original
True
What is the syntax for creating a clone object
create table <name>
clone <source></source>
before (timestamp => (timestamp))</name>
Is data sharing involve the recipient with the right to update the data in the original table
False
What is the syntax for sharing data on a database
grant usage on database <data-s> to share orders_share;</data-s>
what objects can you data share
Database
Schema
Table
What is behind_by
it is the time difference between a query and an materialised view.
What does masking policy do
hire some field content to unauthorised users
What is a classic console
This is the old UI
Where can you get more information of how to do things in Snowflake
docs.snowflake.com
Where can you get more information about courses
learn.snowflake.com
What is identity all about
identifying the person
What is authentication
What the user can do
what is the role with more power
accountadmin
What is the role given for all account trial users as default
SYSADMIN
In which direction the inpersonation flows
Downward
If you had been awarded SYSADMIN directly, you would not be able to impersonate ACCOUNTADMIN.
True
What is DAC
Discrecionary Access Control: you created you owned it
After creating a database, what are the default schemas provided
Information schema and public
What are databases created for
Databases are used to group datasets (tables) together.
Can an information schema be dropped
False
What is the content of information schemas
The INFORMATION_SCHEMA schema holds a collection of views.
How many menus are included in worksheet
Role
Warehouse
Database
Schema
what does Running a SHOW DATABASES command do
It is just like being at the first level of an Object Picker (but with more details, and the ability to cut and paste the info into a spreadsheet).
Tsai points out that every worksheet
has four configuration settings that
are saved with the worksheet. How are they known as
Context settings, 2 provide shot cuts for the data storage location. The other are for the user role and warehouse.
What is the warehouse
The machine that crunches data (computer power).
What is the idle time allows for the machine to turn of wait for any code running
10 minutes
Can Select statement run without role or datawarehouse
False
Is the user role and warehouse mandatory
True
What can you do with Worksheets
Write code
Run code
Save code
Share
Up to what cluster can a warehouse stretch
10
Team is also known as
Clusters
Team members are also known as
Servers
Warehouses can be manually scaled UP or DOWN. They can be set up so that they automatically scale OUT. What did the video call the opposite of SCALING OUT?
Snapping back.
What is the meaning of cluster
Group of servers
XS not scaling how many servers does it have
1 cluster
M not scaling how many servers does it have
1 cluster
Is the number of servers different based on XS, S, M, etc
True
A XS warehouse when it is scale out would it has more than 1 cluster
True
A M warehouse when it is scale out would it has more than 1 cluster
True
do clusters hold more than 1 server
True
Has any worksheet include a default role
True
So what is a stage?
A stage or staging area
is a place we put things temporarily
with the intention to later move
them to a more stable, longer
term location.
What are the Stages type
Internal and External (Cloud storage services)
What are the requirments for creating an external AWS Cloud services
Storage location - S3 bucket
Cloud storage access
AWS IAM user and policy and
In snowflake define stage object with reference to the AWS S3 bucket
does snowflake care about capitalization
False
snowflake will type table names in uppercase if the name is not in double quotes
True
What the tools used for loading data into AWS S3 buckets
Amazon transfer for AFP
Amazon CLI
Web browser interface
Python
Java
REST API
Is there any load wizard for loading data from S3 to Snowflake
False
To run the COPY INTO command, you
need to have three database objects
defined.
Table
Stage
File format
snowflake tools allow you to import 5 popular semi structure data formats like
Json
xml
Parquet
Avro
Orc
A Snowflake tool used for ingesting semi structure data format is known as
Snowflake variant data type
When loading semi structure data, what is the data type of the column holding the upload data
Variant
What are the data type trend
Unstructured
Quasi structured
Semi structured
Structured
What are some examples of Unstructured
Images, PDF, videos
What are some examples of Quasi structured
ClikStream
What are some examples of Semi structured
XML, JSON, Parquet, AVRO, ORC
What are some examples of Structured
RDBMS AND CSV files
What the flatten command do
It flatten for example list arrays in a Json file. This is similar to the Russian nested dolls.
Usually how are the twit entities called
Statuses
What are the layers of Snowflake architecture
Storage
Query Processing
cloud Services
What is the query processing
It is the muscle of the system and performs massive parallel system
Which layer is the brain of the system
Cloud Services: access control, security, optimizer, metadata, manage the infrastructure.
What is Data Warehouse
DB that is used for reporting and data analysis.
VALIDATION_MODE = RETURN_ERRORS; What is this
When loading the file will be checked for errors.
VALIDATION_MODE = RETURN_5_ROWS ; What is this
When loading the file; the first 5 lines will be checked for errors.
SIZE_LIMIT = 2000; What is this
The system will load up to 20000 mb
RETURN_FAILED_ONLY = TRUE; What is this
The system will indicate the files containing errors.
TRUNCATE COLUMNS = TRUE; What is this
Strings are automatically truncated to the target column length
TRUNCATE COLUMNS = FALSE; What is this
Copy pro9duces an error if a loaded string exceed the target column length
FORCE=TRUE; What is this
Indicates tot he system to load the files(s) regardless of whether the file has been loaded previously and have not change since loaded previously. It has the potential to duplicate the data. Default is FALSE.
What is Loading History
Enables you to retrieve the history of data loaded into tables using the COPY INTO <Table> command
We have 3 files in our external stage with 20MB (~20 000 000 bytes) each.
How many files will be loaded if we set SIZE_LIMIT = 30 000 000 ?
2
how do you change from variant to string
SELECT RAW_FILE:first_name::string as first_name FROM OUR_FIRST_DB.PUBLIC.JSON_RAW;
How to create a table by using select statement
How to format a date column - parquet data to readable date
DATE($1:date::int ) as Date
What is this statement METADATA$FILENAME AS FILENAME do in an sql query
Indicate the file name loaded to a stage or table
What is this statement TO_TIMESTAMP_NTZ(CURRENT_TIMESTAMP) AS LOAD_DATE,
W
What measures can be taken to improve performance
- Create virtual warehouses
- Scale up during season of high demand
- Scale out to unknow patters of workload
- Maximise cached usage
- Cluster keys - large tables
What is a cluster
Subset of rows t olocate the data in micro partitions. For large tables improves the table scan.