SAS Data Curation Practice Exam Flashcards
Which statement best describes the difference between the Inventory tab and the Folder tab in the SAS Data Integration Studio interface?
Select one:
a. The Folder tab contains metadata objects in a customizable folder structure. The Inventory tab contains objects in pre-determined folders based on the object type.
b. The Folder tab contains objects created by the current user, while the Inventory tab contains objects created by all users.
c. The Folder tab contains jobs and stored process objects, while the Inventory tab contains source tables and target tables.
d. The Folder tab contains metadata items related to jobs that are open in Job Designer windows. Other objects are contained within the Inventory tab.
a. The Folder tab contains metadata objects in a customizable folder structure. The Inventory tab contains objects in pre-determined folders based on the object type.
The Folder tab displays the metadata folder hierarchy that is shared across the SAS platform and it can be customized and used to organize metadata in a structure that meets the needs of the user. The Inventory tab displays metadata in predefined categories that organize the metadata by type. The other options are not true statements about the Folder tab and the Inventory tab.
Identify the format used to move metadata for SAS platform metadata objects within the same metadata repository OR between different SAS metadata repositories.
Select one:
a. Star Schema
b. SAS Package
c. Metadata Bridge
d. Common Warehouse Metamodel
b. SAS Package
The type of file created for the purposes of moving metadata within a metadata repository, or from one metadata repository to another, is known as a “SAS Package.” Metadata bridges are the SAS product that allows users to move third party metadata from a third-party database to SAS or from SAS to a third-party database. When importing third party metadata, the format used is a version of the Common Warehouse Metamodel format. Therefore, C and D describe technologies to help users move third-party metadata. A star schema is a paradigm for organizing data in a data warehouse and is unrelated to importing metadata. Therefore, A is also false.
What is defined by the Location field in the New Library Wizard?
Select one:
a. The server where the data is located.
b. The file path where the data is located.
c. The metadata folder where the library metadata is stored.
d. The connection profile where the library is stored.
c. The metadata folder where the library metadata is stored.
The Location field does not contain information about how to assign the library, rather it contains the location in metadata where the library assignment definition should reside. The server and file path information are information about the actual library assignment. The connection profile tells SAS Management Console where to look for metadata and how to access it, and is already opened and accessed before invoking the New Library Wizard.
Assume that you have completed the Register Tables wizard in SAS Data Integration Studio. Which statement is true?
Select one:
a. The physical table(s) selected are copied to the application server specified in the library.
b. The physical table(s) selected are copied to the SAS Folders location specified in the wizard.
c. Metadata for the physical table(s) selected is stored on the application server specified in the library.
d. Metadata for the physical table(s) selected is stored in the SAS Folders location specified in the wizard.
d. Metadata for the physical table(s) selected is stored in the SAS Folders location specified in the wizard.
The Register Tables wizard queries the user for a library and table to register, then gathers information about that table to store into metadata at a specified location. Outside of gathering metadata information of the table, no movement of the table anywhere is done. Hence the “copied” choices are invalid. Metadata is not stored on the application server, only the metadata server, so that choice is also invalid.
How do you register column metadata from a comma-separated file where the top of the data file looks like this:
To: A.Person
Subject: your data
date: 01/01/1960
fld1, fld2, fld3, fld4
1,1,1,1
2,2,2,2
3,3,3,3
Select one:
a. Get the column names from column headings in this file.
b. Use PROC CIMPORT to read the column headers.
c. Get the column definitions from a COBOL format file.
d. It is not possible to register column metadata from this file.
a. Get the column names from column headings in this file.
Often customers need to read in “flat-files” with abnormal structures at the beginning of the file. The easiest way in DI Studio to get column header information mid-file is to use “Get column headings from file” in the “Import column definitions” dialog.
Which definition cannot be imported using a SAS Metadata Bridge?
Select one:
a. Column definition
b. Index definition
c. Server definition
d. Table definition
c. Server definition
SAS Data Integration Studio can import column, table, and index definitions using a SAS metadata bridge. It cannot import server properties from a SAS metadata bridge.
Which two statements are true regarding columns when using the SAS Data Integration Studio New Table wizard?
Select one or more:
a. You can access metadata for indexes from tables already registered in metadata.
b. You can access metadata for any column from tables already registered in metadata.
c. You can define new columns for the table.
d. You cannot override imported columns for the table.
b. You can access metadata for any column from tables already registered in metadata.
c. You can define new columns for the table.
The New Table wizard provides access to column metadata in already registered tables and it allows new columns to be defined. The New Table wizard does not provide access to index metadata in already registered tables and it does allow imported column metadata to be overridden.
After opening a Join in Data Integration Studio, the following options are available in which parts of the Join’s designer window? (Choose two.)
Choose Columns…
CASE…
Subquery…
Advanced…
Select one or more:
a. the “Expression” window
b. a field’s “Expression” definition
c. a “Group By” clause definition
d. an “Operand” definition
b. a field’s “Expression” definition
d. an “Operand” definition
Users can specify these options in a field’s Expression definition and when setting an Operand definition for a clause within the Join transformation, such as Where or Having.
Within SAS Data Integration Studio, which type of expression can you create on the Where tab of the Extract transformation?
- a SAS expression
- an SQL expression
- an XML expression
- a constant
Select one:
a. 1 and 2 only
b. 1 and 3 only
c. 1, 2 and 4 only
d. 1, 2, 3 and 4
c. 1, 2 and 4 only
A SAS expression, an SQL expression, and a constant are valid on the Where tab in the Extract transformation. An XML expression is invalid.
Which statement is true regarding the Set Operators transformation?
Select one:
a. The Set Operators transformation performs set operations on queries of the source tables.
b. By default, the set operations keep non-common columns.
c. All of the set operations keep duplicate rows by default.
d. By default, the set operations match columns from different sources tables by name.
a. The Set Operators transformation performs set operations on queries of the source tables.
Each table has a Select, Where, Having, Group By and Order By tab in the Set Operators tab where users can configure a query for each source table. By default, the set operations match columns from different source tables by position in the table, not by name. Only the Outer Union set operation keeps duplicate rows by default.
How can you test the options of a new transformation in SAS Data Integration Studio?
Select one:
a. The transformation has to be used within a job to interactively test the options with the Test Prompt button in the Options window from the properties of the transformation within the job editor.
b. The options can be tested after finalizing and saving the transformation with the Test Prompt item on the Tools menu.
c. The options can be tested only after adding all options that are assigned to the source code with the Test Prompt button in the Options window of the New Transformation wizard.
d. The options can be tested at any time after adding them to the transformation with the Test Prompt button in the Options window of the New Transformation wizard.
d. The options can be tested at any time after adding them to the transformation with the Test Prompt button in the Options window of the New Transformation wizard.
When creating a custom transformation, a user can check the way his options will appear in the final transformation at any time by pressing the Test Prompts button in the New Transformation Wizard.
The New Transformation wizard allows you to extend the capabilities of SAS Data Integration Studio and create new transformations. Which language does it support?
Select one:
a. SAS
b. Java
c. Python
d. C++
a. SAS
User-generated transformations require SAS code.
Which statement is TRUE regarding Java Plug-in Transformation templates and New Transformation templates?
Select one:
a. Java Plug-in Transformation templates and New Transformation templates cannot be stored in the same Transformation Category on the Transformations tab.
b. Java Plug-in Transformation templates and New Transformation templates have different options in their right-click pop-up menus.
c. Java Plug-in Transformation templates and New Transformation templates all have the same icon.
d. Both Java Plug-in Transformation templates and New Transformation templates are created with a wizard in SAS Data Integration Studio.
b. Java Plug-in Transformation templates and New Transformation templates have different options in their right-click pop-up menus.
Java Plug-in Transformation templates and New Transformation templates can be stored in the same Transformation Category on the Transformations tab. An example is the Analysis Category, which comes with the Java Plug-in Transformation template “Forecasting” and the New Transformation template “One-Way Frequency.” They also have different icons. Java transformation icons vary, while all New Transformation templates share the same icon. Finally, only New Transformation templates can be created in SAS Data Integration Studio with a wizard.
Refer to the screen capture below, which shows the contents of a Quality Knowledge Base:
What do the icons to the left of the objects represent?
Select one:
a. the type of object in the Quality Knowledge Base
b. the data type associated with the object
c. the locale level associated with the object
d. the type of definition in the Quality Knowledge Base
c. the locale level associated with the object
The icons to the left of each item in the list indicate the locale level associated with the object in the QKB.
Which two types of items comprise the Quality Knowledge Base (QKB)?
Select one:
a. files and repositories
b. files and definitions
c. files and reference data sources
d. definitions and reference data sources
b. files and definitions
In the QKB (Quality Knowledge Base), various definitions are listed for each locale. The definitions can be “opened” where a sequence of processing will then be surfaced in a “flow” diagram. Many of the items listed in these “flow” diagrams are pointing to various files of the QKB (schemes, chop tables, phonetics libraries, regex libraries, vocabulary libraries, grammars).
You are working with the data below, which represents peoples’ last names (or surnames). You would like to apply the proper casing to the data using a Case Definition. Which two Quality Knowledge Base (QKB) file components can you use within the Case Definition to accomplish this task?
MacAlister
MacDonald
McCarthy
McDonald
McNeill
Select one or more:
a. Regular Expression library
b. Standardization Scheme
c. Vocabulary
d. Phonetics library
a. Regular Expression library
b. Standardization Scheme
In a Case Definition, the user can either use a Standardization Scheme or a Regular Expression library for ensuring the proper casing of words with “tricky” casing in the string.
What is the first step for parsing a data string using a Parse definition?
Select one:
a. Chopping
b. Word Categorization
c. Tokenization
d. Preprocessing
d. Preprocessing
The template for Parse definitions has, for every parse definition, an initial Preprocessing step (which if specified, involves one or more regex libraries).
How do you detach a data job tab in DataFlux Data Management Studio?
Select one:
a. On the tool bar, click the Detach tool.
b. Drag the tab with the mouse cursor from its docked position.
c. From the File menu, select Detach.
d. You cannot detach a primary tab in DataFlux Data Management Studio.
a. On the tool bar, click the Detach tool.
The detach tool is used to open a data job in a new window so that it can be compared side-by-side with another job.
Consider the partial screen capture from a Profile Report below:
Why is the PRODUCT CODE column not a valid candidate for a primary key?
Select one:
a. There are not enough observations in the table to declare a primary key.
b. The pattern count metric is greater than 1.
c. The data type is “string,” and only numeric data can be considered for primary keys.
d. The data values are not 100% unique across all of the rows.
d. The data values are not 100% unique across all of the rows.
In order for a column to be a valid primary key candidate, the values must be 100% unique across the rows, and also no null values. ALthough this column meets the criteria of no null values, it is not 100% unique across rows, as you can see by examining the Uniqueness metric (96.55).
A Frequency Distribution node can assist in determining how well standardization rules and QKB customizations are addressing data quality issues. By selecting the frequency count of the standardization flag field, what do the Preview window results represent?
Select one:
a. 28.57 percent of the data were affected by the Standardization node.
b. 71.43 percent of the data still needs to be cleansed so that data quality is improved.
c. The Field Layout node will only process the 6 standardized rows.
d. Standardization rules were applied to 15 of the 21 rows.
a. 28.57 percent of the data were affected by the Standardization node.
A standardization flag of true represents those rows whose value has been altered from its original value by the standardization rules. A standardization flag of false does not indicate that the data value is bad. All rows are passed along through the data job flow.
You are building a data job to prepare data for future analysis and reporting. A sample of the company name data is shown below. Which SAS-supplied Case Definition do you use to ensure the proper casing of your company names?
Kaiser-Permanente
McDonalds
The SAS Institute
O’Neil Deli and Pub
SAS
Select one:
a. Proper
b. Proper (Company)
c. Proper (Name)
d. Proper (Organization)
d. Proper (Organization)
The Proper (Organization) is the valid name of the Case Definition to be used in casing organization names. Proper and Proper (Company) are not valid definition names, and Proper (Name) is used to properly case the names of individuals.
You are creating a data job to apply a data cleansing process to an input data field containing city, state and postal code data. You would like to create individual fields from the components of the data values, with the resulting data being written into individual fields for City, State/Province and Postal Code. Which node would you use to accomplish this result?
Select one:
a. Right Fielding node
b. Parsing node
c. Identification Analysis node
d. Standardization node
b. Parsing node
The Parsing node will have to be used to accomplish this task. Although the Identification Analysis and Right Fielding nodes can recognize the full CSZ field, they cannot be used to break the data into the component parts.
A table has these fields:
CUSTOMER_ID
FIRST_NAME
LAST_NAME
COMPANY
ADDRESS
CITY
POSTAL_CODE
PHONE
The table has many duplicate records for customers. You will use a Clustering node to find the duplicates using the conditions shown below:
Note the name information in the table is parsed in two fields, however, the match code for name information is contained in a single field.
After reading the table with a Data Source node, what is the minimum number of nodes (not counting the Clustering node) necessary to prepare the data for these clustering conditions?
Select one:
a. 1
b. 2
c. 3
d. 4
c. 3
For the conditions shown for the provided field names given, you would need three nodes:
1. a Match Codes (Parsed) node to generate a match code string for the parsed name fields,
2. a Match Codes node to generate the remaining match code string fields, and finally
3. a standardization node to generate the standardized value for the phone field.
A sample of data has been clustered and found to contain many multi-row clusters. To construct a “best” record for each multi-row cluster, you need to select information from other records within a cluster.
Which type of rule allows you to perform this task?
Select one:
a. Clustering rule
b. Record rule
c. Business rule
d. Field rule
d. Field rule
Field rules is the correct answer as field rules are the only mechanism available for selecting information from other records in a cluster.
In a data job that performs entity resolution for a single table, which node do you use to select a best record from a group of clustered records?
Select one:
a. Data Sorting node with NODUPKEY option
b. Match Report node with surviving option
c. Master Data Record node
d. Surviving Record Identification node
d. Surviving Record Identification node
The Surviving Record Identification node provides properties for selecting the best record from a group of clustered records. There is no NODUPKEY option on the Data Sorting node. There is no surviving option on the Match Report node. There is no Master Data Record node.
Which match code fields were generated using the lowest sensitivity?
Select one:
a. “M@27B43743@Y87BB&YR8ß$$$$$$” and “B7W7P@3_~M4P4$$$$$$$$$$$$$$$$$$$$”
b. “43Y$$$$$$$$$$$$$$$$$$$$” and “BW$$$$$$$$$$$$$$$$$$$$$$”
c. “43&Y886437$$$$$B8íYR$$$$$$$$” and “B£W£P£3£~M4P4$$$$$$$$$$$$$$$$$$$$$$”
d. “43Y8B4$$$$$$$$B&Y$$$$$$$$$” and “#YVY@7$$$$$$$$$$$$$$$$$$$”
b. “43Y$$$$$$$$$$$$$$$$$$$$” and “BW$$$$$$$$$$$$$$$$$$$$$$”
Match codes in this example were generated using the lowest sensitivity, as the match codes are the shortest.
What are two SAS system options used for interacting with the Quality Knowledge Base (QKB) in SAS code and applications?
Select one or more:
a. DQSETUPLOC
b. DQLOCALE
c. DQQKBLOC
d. DQQKBPATH
a. DQSETUPLOC
b. DQLOCALE
The two system options that can be set programmatically in SAS to access the QKB components are DQSETUPLOC and DQLOCALE.
What is the name of the language and geography component of the Quality Knowledge Base that is used to organize the available definitions?
Enter your answer in the space below. Case is ignored.
locale
A “locale” is the organizational component of the QKB that is used to group definitions by language and geography (for example, English (United States)).