SaS Data Curation Professional Flashcards

1
Q

Data Curation ?

A

The process of preparing data for analytics is referred to as data curation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Scientist ?

A

Take an enormous mass of messy data points (unstructured and structured) and use their formidable skills in math, statistics, and programming to clean, manage, and organize them. Then they apply their analytic powers - industry knowlegde, contextual understanding and scepticism if existing assumptions - to uncover hidden solutions to business challenges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data curation lifecycle?

A

Data curation refers to the process of finding, exploring, structuring, cleansing, updating, and eventually archiving data. This process can be looked at as the Data Curation Life Cycle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

SAS client Applications?

A

On the SAS Platform, users with different roles each use specialized client applications designed to accomplish specific types of tasks. With these client applications, users access application servers and data sources in order to execute processes. programming or point and click interfaces.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

SAS administrator job?

A

Users with an administrative role use client applications to define the application servers and data source connections. The administrators also define user and group identities, logins, and permissions in the metadata to control access to application servers and data sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Components of a computing environment?

A

include the processors (also referred to as central processing units, or CPUs), memory, storage, and network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Different data storage methods

A

relational database management systems, Hadoop, data lakes, and cloud storage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

RDBMS

A

Structured data
Predefined Schemas
SQL programming language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Hadoop

A

Open Source Software
Computer cluster
Distributed Storage
Parallel processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data Lakes

A

Unstructured and structured data

Large variety and volume of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Cloud storage

A

Method for storing data off site
Allows for scalability depending on the amount of storage a company needs
Data is often stored across machines in the cloud

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Parallel processing

A

The concept of breaking jobs into tasks that run simultaneously is referred to as parallel processing. Parallel processing, or parallel computing, allows for jobs to execute faster and processing to happen simultaneously on smaller tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Grid computing

A

Grid computing systems link computer resources together in a way that lets someone use one computer to access and leverage the collected power of all the computers in the system. To the individual user, it’s as if the user’s computer has transformed into a supercomputer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Cloud computing

A

Cloud computing is a broad term that refers to the immediate access to computing resources hosted over the internet. These resources can include software, data storage, processing power, and more. Amazon Web Services defines cloud computing as follows: “Cloud computing is the on-demand delivery of computer power, database, storage, applications, and other IT resources via the internet with pay-as-you-go pricing.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

IaaS

A

Providers of Infrastructure as a Service supply the infrastructure, which includes the basic computing resources and storage, and the users then build everything else that they need. When companies rely on IaaS providers, it can be thought of as renting servers, and their users can install operating systems and programs on the servers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

PaaS

A

With PaaS, a provider offers more of the application stack than IaaS providers, adding operating systems, middleware (such as databases) and other runtimes into the cloud environment.

17
Q

SaaS

A

With Software as a Service, cloud providers host software applications. These applications are available to customers via the internet. SAS offers some SaaS products, including SAS Visual Analytics for SAS Cloud, SAS Visual Statistics for SAS Cloud, SAS Visual Data Mining and Machine Learning for SAS Cloud, and more.

18
Q

SAS metadata server

A

The SAS Metadata Server controls access to a central repository of metadata that is shared by all SAS applications in the deployment. The metadata repository includes information about the following:
• libraries and tables that are accessed by your SAS applications
• content created and used by SAS applications, including reports and queries
• SAS and third-party servers that participate in the system
• users and groups and associated permissions
When you log on to SAS applications that are part of the SAS Platform, you first authenticate to the SAS Metadata Server.

19
Q

Workspace Server

A
  1. When users of client applications submit SAS code, it is executed by a SAS Workspace Server session.
  2. The workspace server supports registering tables in metadata and importing data, tasks that you learn about later.
  3. When you submit SAS code, the metadata server starts a workspace server session that executes the code.
  4. SAS deployments can have multiple users submitting SAS code from client sessions, and each user is provided his or her own workspace server session.
  5. In addition, SAS deployments can be implemented with multiple workspace servers .
20
Q

Exploring the data?

A
  1. Visualize and plot the data
  2. Identify anomalies and inconsistencies (missing values, incorrect data entries, spelling mistakes, casing)
  3. Calculate descriptive statistics
21
Q

Data Governance?

A

the overall management of the availability, usability, integrity and security of data used in an enterprise

22
Q

SAS/ACCESS technology

A
  1. SAS/ACCESS technology enables users to query and manage data stored in databases and other data sources.
  2. Users can manage, update, and query data using SQL that is native to the database or using SAS language.
23
Q

SAS Data Integration Studio

A
  1. SAS Data Integration Studio is a SAS platform application interface that enables users to manage their data integration processes across an organisation.
  2. Users can create jobs using a drag -and- drop interface. These jobs (read data from the source, transform the data and load data into SAS tables) generate SAS code to access, manipulate, integrate, and store their data across a wide variety of data formats.
24
Q

SAS dataflux management studio

A
  1. DataFlux Data Management Studio is a platform application interface designed for data integration and advanced data quality.
  2. To perform a wide variety of data quality operations, users leverage an extensive library of data quality rules and algorithms, referred to as the Quality Knowledge Base, as well as third-party reference data packs.
  3. These operations include standardization, entity resolution, address verification, and more.
  4. DataFlux Data Management Studio also has built -in tools to profile data and build business rules, enabling data quality stewards to identify and remedy issues in their data.
  5. Users can design automated processes to assess data for specific data quality issues and generate alerts when such issues arise.
25
Q

SAS data loader for Hadoop

A

SAS Data Loader for Hadoop is a web-based, non-programmatic way for users to interact with data in Hadoop. It can be used to move data in and out of Hadoop; interrogate and profile data for quality issues; transform, transpose, and join data; and more.

26
Q

SAS federation server

A
  1. A platform application interface (through a web browser) that makes it easier for business users to access secure data for reporting and analysis - apply data quality functions, improve data access performance, maintain, configure and monitor data.
27
Q

SAS event stream processing studio

A
  1. Graphical and code base interface
  2. Ingest, filter, join and aggregate event streams
  3. Execute external routines against event streams
  4. Detect patterns in event streams
28
Q

SAS QKB

A

The SAS QKB is a collection of files and algorithms that store data and logic for defining data management operations such as data cleansing and standardization.

29
Q

Metadata on SAS platform

A

Metadata is stored information about the characteristics of another object, such as source data, target data, jobs, users, user permissions, and more. Metadata is shared between SAS Platform tools, making it easy to track where objects have been used, and how they were used.

30
Q

Few algorithms of DataFlux data management studio

A
  1. build data jobs
  2. identification analysis
  3. gender analysis
  4. data parsing
  5. entity resolution
  6. address verification and enrichment
31
Q

Data managements studio ~ QKB?

A
  1. understand QKB definitions
  2. change QKB definitions
  3. configure SAS to work with QKB definitions in SAS code