SaS Data Curation Professional Flashcards
Data Curation ?
The process of preparing data for analytics is referred to as data curation
Data Scientist ?
Take an enormous mass of messy data points (unstructured and structured) and use their formidable skills in math, statistics, and programming to clean, manage, and organize them. Then they apply their analytic powers - industry knowlegde, contextual understanding and scepticism if existing assumptions - to uncover hidden solutions to business challenges.
Data curation lifecycle?
Data curation refers to the process of finding, exploring, structuring, cleansing, updating, and eventually archiving data. This process can be looked at as the Data Curation Life Cycle
SAS client Applications?
On the SAS Platform, users with different roles each use specialized client applications designed to accomplish specific types of tasks. With these client applications, users access application servers and data sources in order to execute processes. programming or point and click interfaces.
SAS administrator job?
Users with an administrative role use client applications to define the application servers and data source connections. The administrators also define user and group identities, logins, and permissions in the metadata to control access to application servers and data sources.
Components of a computing environment?
include the processors (also referred to as central processing units, or CPUs), memory, storage, and network.
Different data storage methods
relational database management systems, Hadoop, data lakes, and cloud storage.
RDBMS
Structured data
Predefined Schemas
SQL programming language
Hadoop
Open Source Software
Computer cluster
Distributed Storage
Parallel processing
Data Lakes
Unstructured and structured data
Large variety and volume of data
Cloud storage
Method for storing data off site
Allows for scalability depending on the amount of storage a company needs
Data is often stored across machines in the cloud
Parallel processing
The concept of breaking jobs into tasks that run simultaneously is referred to as parallel processing. Parallel processing, or parallel computing, allows for jobs to execute faster and processing to happen simultaneously on smaller tasks.
Grid computing
Grid computing systems link computer resources together in a way that lets someone use one computer to access and leverage the collected power of all the computers in the system. To the individual user, it’s as if the user’s computer has transformed into a supercomputer.
Cloud computing
Cloud computing is a broad term that refers to the immediate access to computing resources hosted over the internet. These resources can include software, data storage, processing power, and more. Amazon Web Services defines cloud computing as follows: “Cloud computing is the on-demand delivery of computer power, database, storage, applications, and other IT resources via the internet with pay-as-you-go pricing.”
IaaS
Providers of Infrastructure as a Service supply the infrastructure, which includes the basic computing resources and storage, and the users then build everything else that they need. When companies rely on IaaS providers, it can be thought of as renting servers, and their users can install operating systems and programs on the servers.