BI Flash Cards - BI Flash Cards (2)
SSDT
Sql Server Data Tools
Can you build packages without connecting to an SQL instance?
Yes
SSIS architecture
Five main components
SSIS Service (for legacy ssis packages)
SSIS Runtime engine and runtime executables
SSIS Catalog
SSIS Data Flow engin and the Data Flow Components
SSIS Clients
What handles the operational aspects of SSIS
SSIS catalog and SSIS service (for legacy mode packages)
Do you need the service to run packages?
no
What happens if you stop the service?
any packages running will stop
Package Catalog
Has new features/ like running packages with T-SQL
What runs the packages?
SSIS Runtime engine and complementary programs
Containers
Add structure and scope to packages
Tasks
functionality
Event Handlers
Respond to raised events in your package
Precedence Constraints
Provide ordinal relationship between various items in your package/ the order of execution (green arrow)
Package
Collection of tasks that execute in an orderly fashion
Package is saved where?
SQL Server (MSDB) or package catalog database. It can also be saved as a .dtsx file (xml structured)
Data flow tab is enabled when?
When a data flow task is created
Control Flow tab does..
handles the main workflow of the package
Data flow tab does
handles data transformation
How many control flows does a package have
1
How many data flows?
many
Excel Destination
outputs data from data flow to excel which must already exist
SQL Server Destination
Package/ must be executed at the destination
Transformations run in-memory or on disk?
In memory. Incurring I/O on the page file on disk will slow things way down
Aggregate transform
Similar to group by
Fuzzy Grouping
Finding rows that are likely duplicates
Fuzzy Lookup
Matches and standardizes data
Ole DB Command
Executes an OLE DB command for each row in the data flow/ can be used to run an update or delete statement inside the data flow
Sort
Sorts by a column and removes exact duplicates
Project
Container of one or more packages and other SSIS components
Project Deployment model
New model that enables many of the new SSIS 2012 features. Deploys the entire project of packages/ and packages can be configured by the DBA
Package Deployment model
Old model where packages are deployed one at a time
Solution
Container of one or more projects. Can house a C# project/ an SSRS project etc all together
Can a solution house more than one type of project?
Yes
Shared parameters and shared connections are only available in the
Project deployment model
.dtsx
SSIS Package
.ds
Shared Data Source file
.sln
solution file containing one or more projects
.dtproj
SSIS project file
.params
shared parameter file
.conmgr
shared connection manager
Miscellaneous folder
holds documentation/ etc. can be checked into version control software
How to convert back and forth between package and project deployment
Right click on the project and select “convert to (product or package)”
Two tabs in SSIS are..
Control flow and data flow
Data flow tab consists of what tasks?
sources/ transforms and destinations
Enter debug mode by doing what?
Execute the package by right clicking on it in the solution explorer and hit execute package. Opens progress tab
Progress tab
Where you go when you have a problem you need to debug
Server Explorer is accessed by
Tools - connect to database
Server explorer lets you
run queries/ create stored procedures/ redesign tables
Connection manager
connection that can be leveraged and consumed once or many times in a package
Can a connection manager connect to the internet? If so/ what ways?
Yes/ SMTP/ FTP/ and HTTP
Are connection managers externally available to the DBA
Yes
Externally available (connection manager) means
that the connection manager can be repointed
How to create a project connection manager
right click on connection managers in the solution explorer
convert a connection manager to a project cm
right click on it in the connection manager pane
configure tasks in control flow by
double clicking
Naming tasks
Example SCR - Encrypt file. Helps you self document your package and gives clarity to the logs
How is a package encrypted by default
With your windows user key
EncryptSensitiveWithUserKey
Encrypts credentials with your user key/ also Encryptallwithuserkey does the whole package
EncryptSensitiveWithPassword
Adds a password so that anyone who runs the package has to have the password
What helps with packages running in production that fail due to permissions
select encryptallwithpassword and pass the password in when scheduling or running
Arrow color meanings in control flow
Green - success
red - failure
blue completion
When upgrading SSIS
Connection strings are updated/ but if your package is dynamically changing the strings it won’t upgrade that
old packages aren’t converted to the new deployment model
Arrow color meanings in data flow
red - bad data / green - good data
Predecessor tasks
any task with a precedence constraint that is connected to another task
How many evaluation operations are there in precedence constraints
4
What are the different evaluation options in a precedence constraint
Constraint, Expression, Expression and Constraint, Expression or constraint
ForceExecutionResult
A property in a task that overrides the execution result to a desired result
Operation Property in File System Task
Decides what the file system task will do
File System Task: IsSourcePathVariable
Is the source of the file system task a variable or a location
File System Task: OverwriteDestination caveats
If you are copying a directory and set this to yes, you may overwrite a needed file, set to no and if the directory exists then your package will fail
Will the rename file operation property move a file?
If the source is different from the destination
UseDirectoryIfExists
File system task option under create directory. Either uses the directory or fails
Script Task is broken down into
Script / General / Expressions node
Dts.Variables() method
used to change package level variables
Common Uses for Script tasks
Changing connection strings, whether a file exists, whether it is in use, if-then logic
TypeConversionMode (Execute SQL Task)
Allows for datatypes to be converted for SSIS tasks
Resultset property (exec sql task)
Can be XML, single row, none, which can be stored in an object variable
Exec SQL task takes what as the SQL Source Type
Direct input (type it in), File connection (command saved in a file), and variable
Parameter mapping in Sql Task starts at
zero
SQL Task - isquerystoredprocedure
Only available on the ADO and ADO.net option
BypassPrepare Query property
“compiles” the query so it doesn’t need to be analyzed every time it is used, won’t parse unless it is set to false
Execute Process Task
Allow for windows and console applications to be used from within the package. Encryption is an example
Expression Task
Manipulates variables in the control flow without having to use VB.NET, good for incrementing variables in a loop, like a set of files
Send Mail Task
Information about the package passed through system variables or user defined variables
Where to place the send mail task
At the end of the package, or in the event handler area
Send Mail OnPreExecute
Will send an email saying when the package starts
Send Mail OnPostExecute
Send mail indicating the start and end time of the package
OnError or OnWarning
Send Mail task will send mail on these conditions
Can the sendmail task send files
Yes
SendMail connects how?
Through SMTP connection manager
SendMail Credentials
Either sent through SSL or windows authentication
SendMail message sourcetype Direct Input
Message is typed in the send mail task
sendmail MessageSourceType File connection property setting
Message is saved in an external file
sendmail message sourcetype variable property
Message is stored in a package variable
FTP Task
Sends and receives files via FTP
FTP Port
Usually 21
FTP Chunk Size
The chunk of file (similar to packet) to be sent, helps avoid exceeding network limitations
FTP Stop on Failure
Do you want the transfer to continue if part of it fails?
FTP Operation options
Send, receive, create local directory, create remote directory, remove local/remote directory, delete local/remote files
Data Flow Task
transfers data from source to destination and can transform data as needed
A Dataflow task can write to multiple locations T/F
TRUE
OLE DB Provider Is being deprecated because of…
Cloud computing, but you should still use it over ODBC because of higher functionality
Most common source used
Ole db
OLE DB has 4 data access modes
Table or view, table or view indicatied in a variable, sql results, or sql statement initiated in a variable
Can SSIS pass a variable into an SQL query in the oledb source?
Yes, by using the ?
three nodes in an oledb source
connection manager, columns (only check the ones you want), Error output (redirects data, ignores, or fails the component)
Can you run a query against an excel source?
Yes, it treats excel like a database
Native Excel Driver is
32 bit
If you use the native excel driver
your packages will have to run in 32 bit mode, even on a 64 bit workstation
to change to 32 bit mode
right click on the project file and select properties, select the debugging tab change run64bitruntime to false
32 bit runtime on Excel Source only applies
in SSDT because it is a 32 bit application. On a server you can download the microsoft access database engine which includes 64 bit driver
How do you correct datatypes in a flat file source
in the advanced tab of the flat file connection manager
Destinations cannot be configured until
you drag the blue/red line to it
in the destination flow if you don’t see the destination you want
simply uncheck “show only installed destinations”
Destination Flow - Table lock
prevents anyone from accessing the table targeted, speeds up the load
What does the sheet creation look like in an excel destination?
It looks like a create table command in SQL
Data conversion transform does what
changes datatypes
output alias in Data Conversion Transform
defaults to copy of…
length precision and scale
used in a data conversion transform
length
number of bytes to store the number, or the number of characters in a string
precision
total number of digits
scale
number of digits to the right of the decimal
Fast parse is available in which transforms?
Flat file source and data conversion transform.
Fast parse does what?
Disables verification of a particular column
When would you use fast parse?
Only if you are sure of the quality of the data
synchronous transform
all the rows don’t need to loaded into memory before being manipulated. Output is synchronous with input
asynchronous transform
all the rows need to be loaded into memory before being manipulated, sort and aggregate are examples
The function of a derived column transform is?
To either replace or create a column in the data stream
Common uses for derived column transform
Auditing rows and editing incoming data using the SSIS expressions
Variables are preceded with what sign?
@
what function parses the middle of a string
substring
Number of keys text box in the aggregate transform
put the number of groups you estimate will be created, found in the advanced tab
Aggregate transform is what kind of transform?
Asynchronous and fully blocking
Sort Transform
sorts by multiple columns, May choose which columns will pass through, can remove rows with duplicate sort values
How do you tell the package that the data is presorted?
Right click the source, click advanced editor, go to the input and output properties and select oledb source output. In the properties pane, change issorted property to true, then you go to output columns and click on each sort column and put in the order
Lookup transform can join how many data sets?
two
no cache mode in lookup transform
High performance overhead on the system
Full cached mode in lookup transform
One table loaded entirely from memory
Is a lookup transform synchronous?
Yes
Partial cache
Middle ground between no cache and full cache in a lookup transform
Match output and no match output
Chooses which rows go to the next destination, the matching or the non matching
Cache Connection Manager and Transform
Allows you to persist lookup information in the cache (same or different package) so you don’t have to reload the cache every time the lookup information is used
Where does the cache transform store data
in either the pagefile (virtual memory) or in the permanent file storage
Cache Connection Manager can…
reference the in-memory cache and read and write the cache to a disk based file
How does a CCM optimize itself?
by allowing you to decide which columns are indexed and which are referenced
how to use cached connection
open the cache transform and create a new cached connection manager, choose which columns are index columns and what position they are in the index order
Script task in a data flow can perform these functions
Transform, source, or destination
data flow script task use as a source
May need to add some business rules as it comes out of the source, sometimes necessary with COBOL files
data flow script task as a destination
use the script to write to non OLEDB destinations, like XML or Sharepoint
selecting readWrite for a variable does what
Locks during the duration of the execution
Input Columns tab of Data Flow Script Task caveat
make sure to only mark columns you need access to writing as such, as this slow things down
Can a script task in a data flow support multiple outputs?
Yes
debugging script task in a data flow
Doesn’t allow for breakpoints, so you will need to use more arcane methods like msgboxes
Conditional Split uses what language?
SSIS expression language
What can you use as a temporary destination
a union all transform
when should you use .NET in a package?
Only when necessary, as the person who inherits the package might not know it
OLE DB Transform
executes an SQL statement for every row in the data stream, ie an update for every row!
fuzzy lookup
minimizes unmatched rows with fuzzy logic also indicates the quality of the match
when should you use a fuzzy lookup transform
only after you have used a regular lookup transform
fuzzy lookup requires at least one field to be
a string to be mapped to string in the reference table
fuzzy lookup passthrough
column is not analyzed but appears in the output stream
fuzzy grouping
eliminates possible duplicates
Is fuzzy lookup/grouping expensive?
Yes
similarity threshold
likeliness of match in percentage
what are the two outputs of a lookup
match output and no match output
How do you use an expression for the variable?
Press f4 to bring up properties after selecting the variable
How do you change the scope of a variable?
Move Variable icon in the variables dialog
Fuzzy grouping key in and key out does what?
Key in is the value in, and the key out is the best match of the possible matches
How do you use a conditional split to clean up fuzzy grouping?
Only save rows where the key in matches the key out
Connection managers function only as sources
false, they are used in both sources and destinations
How do you get a variable to evaluate as an expression?
select it and press f4, this brings up properties. Change evaluate as expression to true, then enter the expression in the expression property
Are variable names case sensitive?
Yes
parameter value in ssdt
design or default value, may be replaced prior to execution
paramter sensitive property
marked as sensitive, its value is encrypted
parameter required
you must provide the parameter prior to execution, rather than using the design parameter
parameter scope
package scope or project scope
To create a project scoped parameter
select project.params in the solution explorer
Uses for parameters
Can be used to provide encrypted information, like passwords. Also, it is much easier to set runtime values for parameters over variables
you can change any available property…
using an expression
When using expressions for properties remember to
convert to the appropriate datatype
Common use of expressions in a connection manager
is to import a collection of flat files using the same package (foreach loop container)
how do you use an expression for multiple files
change the connection string between iterations
To use a backslash in SSIS expressions
write a double backslash, even in quotes
Can you use expressions in tasks?
Yes
How do you use expressions in tasks?
select the task and hit f4, find the expression property, click the ellipsis
What should you do before you write an expression?
Determine the datatype of the property
Two ways to track changes to the source table
Keep a shadow copy of the table, or use change data capture
how to handle deletes in ETL?
either scan the table or keep a list of keys.
Establishing the start and end dates
Search the destination table for the max modified date, set the start date to now. Use andExecuteSQL task with resultset set to one row
Logical vs Physical Deletes
Logical uses a delete flag, physical is just a delete
Passing values from executeSQL task to data flow OLEDB source
uses parameters set by executesqltask
CDC Control Task
Responsible for INITIAL loading of data and maintaining incremental load status by keeping with the log serial number associated with changes, does error handling
CDC Source Task
Reads data that has been inserted, updated, or deleted within a range and provides those rows to downstream consumers
CDC Splitter
Accepts dataflow from the CDC source task and splits it into three data flows, insert, update, delete
How does CDC work?
By reading the log files
Uses for CDC
data marts and data warehouses, also replication and auditing
CDC uses how many packages?
two, one for initial, and one for incremental
Data Quality Services
Helps improve data in the organization
Master Package does what?
Runs child packages, in certain order
Master Package variables
are controlled globally and shared with the child packages
Master packages allow you to control parallelism t f
True
sequence containers
group tasks to disable a part of the package that isn’t needed
Narrow the scope of a variable to just the container
collapsing and expanding the container to hide the tasks within
managing the properties of multiple tasks in one step by setting the container
using one method to make sure multiple tasks execute successfully
creating a transaction across data related tasks, not the entire package
event handlers on a single container
for loop container
loops through control flow tasks
Types of iterating containers
for and foreach
for each has what types of objects?
File, Item (set manually), ADO (list of tables or rows), SSIS variable, XML nodelist, SMO (SQL Management Objects)
Slowly changing dimension transform
Inserts and updates to dimension tables, only works with sql server
Bulk Insert Task
Loads directly into SQL Server without transformation
Execute Process Task
Opens application or batch file
Expression task
builds and evaluates expressions that set variables at runtime
Parameters can be scoped to
Package or project level
How do you create a package level parameter
parameter tab
How do you create a project level parameter?
Project.params under solution explorer
Data Quality Services Checks for
Completeness, Consistency, Conformity, Validity, Accuracy, Duplication
Data Quality Services keeps decisions in
DQKB
DQKB
Data Quality Knowledge Base
Data Quality Services has what components?
Data Quality Server and Client
=
Set a variable to a value
==
checks to see if there is equality between two values
Configuration Tables are used for
Storing values to be used for production deployment, for example connection strings, etc
Configuration tables are only available in the
package deployment model
Best practice for configuration manager for configuration table
It should have its own connection manager, even if it is in the source database
Many packages may use the same configuration table?
True
How does each package filter out its configurations?
With the filter name on the configuration table
How do you save configuration information in an SSIS package?
Configuration tables, or XML configuration files
advantages of using XML to store configuration info
compatible with source control systems, easier than adding rows to a database
using xml files for configuration requires which deployment model?
Package
Ways to troubleshoot
Logging, event handlers, data viewers, breakpoints
how to enable loggoing
Right click in the control flow and select logging
places logs can be stored
Sql Server, SQl Server Profiler, Windows Event Log, Text File, XML file
two common uses for an event handler
notification and logging
how to create custom logging
Use Execute SQL Tasks in the Event Handlers tab
When should data viewers be removed?
When sending a package to production
How do you create a data viewer
Double click on the red or blue line connecting two tasks, select the dataviewer node
Once in the dataviewer node of the data flow path editor, how do you proceed
enable the dataviewer with the checkbox, and then select the columns you want
What does the detach button do in the data viewer
It allows the package to continue running without triggering the dataviewer
How do breakpoints work
allow you to inspect the state of tasks, variables, and data at different points in the package
To use configuration tables, your package must
use the package deployment model
Configuration tables can be used for
any property for the package, connection, container, variable, or task
Project deployment model uses what for configuration information
parameters and environments
How to create configuration table for a package
Right click on the canvas in the control flow and choose package configurations