Data Analysis Tools Flashcards

Question

Application-based integration

Answer 1

Software applications do all the work: locate, retrieve, clean, integrate data from disparate sources. Easy for data to move from one source to the other

Answer 2

Simplified processes: One application does all the work automatically Easier information exchange: the application allows systems and departments to transfer information seamlessly Fewer resources are used: because much fo the process is automated, analysts can pursue other projects

Answer 3

Limited access: requires technical knowledge and a analyst to oversee application maintenance/deployment Inconsistent results: Approach is unstandardised and varies from businesses offering this as a service Complicated set-up: designing requires developers/analysts etc... with technical knowledge Difficult data management: accessing different systems can lead to compromised data integrity

Answer 4

Accesses data from even more disparate sets and presents it uniformly, while allowing the data to stay in its original location

Answer 5

Lower storage requirements Easier data access (works well with multiple data sources/systems) Simplified view of data: uniformed appearance of data for end user

Answer 6

Data integrity challenges: lots of sources can lead to compromising data integrity Strained systems: Data host systems are not usually designed to handle the amount/frequency of data requests

Answer 7

Similar to uniform access - involves creating and storing a copy of the data in a data warehouse. More versatility

Answer 8

Reduced burden: host system isn't constantly handling data queries Increased data version management control: one source = better data integrity Cleaner data appearance Enhanced data analytics: maintaining a stored copy allows more sophisticated queries

Answer 9

Increased storage costs: creating a copy = paying for storage Higher maintenance costs =

Answer 10

Merge data for basic analysis between a small amount of data sources

Answer 11

Automate and translate communication between legacy and modernised systems

Answer 12

Automate and translate communication between systems and allow for more complicated data analysis

Answer 13

Automate and translate communication between systems and present the data uniformly to allow for complicated data analysis

Answer 14

Present the data uniformly, create and store a copy and perform the most sophisticated data analysis

Answer 15

leaves the data in the source systems and defined a set of views to provide and access the unified view to the customer across the whole enterprise pros: nearly zero latency of the data updates cons: limited possibility of datas history and version management + apply to simialr

Answer 16

The process of reading multiple data sources into the Data Warehouse, COPIED (not moved). Data validation occurs during this stage, you must have the correct: - structure - format - permissions Method of extraction - for each data source, define whether the extraction process is manual or tool based.

Answer 17

The process of combining the data tables or linking them using relational databases. The data itself is not changed in any way. Data engineers must consider the efficiency of the databases as well as ensuring that all necessary data can be accessed. Before moving the data into a single point of reference we need to: - remove inconsistencies - Standardise various data elements - Make sure of the meanings of the data names in each file - Deduplication - Deriving new calculated values - Data validation

Answer 18

The process of writing the data to the target database Due to the nature of Big Data = necessary to use parallel processing to manage the volume of data being written to the system. Data Verification is undertaken post-loading to ensure the data is accurate.

Answer 19

It is the order we want the compute to execute the instructions we provide as programmers.

Answer 20

Selecting which path of an algorithm to execute depending on some criteria.

Answer 21

Refers to looping or repeating procedures.

Answer 22

``` SELECT column, names, FROM table-name WHERE condition / filter on rows ORDER BY sort-order HAVING filters / sorts / arranges on groupby LIMIT restricts number of rows DISTINCT brings back unique values ```

Answer 23

Union = multiple tables with a single query 1. Must match the number of columns, compatible data types 2. Can only have one ORDER BY at the bottom of the full select statement 3. UNION = removes exact duplicates where UNION ALL allows for duplicates 4. Conditions between UNION SELECT statements should match

Answer 24

groups data into categories or classifications

Answer 25

type of variable that allows storing a date/time value in the database YYYY-MM-DD HH:MI:SS

Answer 26

mathematical operations within the relational database: arithmetic, comparison, logistical, string

Answer 27

+ - * / ** (addition, subtraction, multiplication, division, to the power of)

Answer 28

= Equal to != or <> Not equal to < Less than <= or !> Less than or equal to (not greater than) > Greater than >= or !< Greater than or equal to (not less than)

Answer 29

ALL / AND Does the value meet ALL criteria ANY/ OR Does the value meet ANY criteria BETWEEN Is the value BETWEEN listed values EXISTS Does a row meeting the criteria Exist IN Is the value found in the listed literal values LIKE Compares the value to listed values using wildcard NOT reverses the meaning of a logical operator IS NULL checks if value is null UNIQUE searches for duplicates

Answer 30

CHAR(n) returns the character at index n CONCAT concatenates items (puts together) FORMAT(n) Returns a number formatted to n decimal places LOWER() returns the argument in lowercase UPPER() returns the argument in uppercase REPEAT() repeats the string a certain number of times TRIM() removes leading and trailing spaces

Answer 31

``` AVG = average value COUNT = number of rows MAX = largest number MIN = smallest number GROUPBY = indicates dimensions to group data by on aggregates ROUND = specifies number of decimal places CAST = changes data type of an expression (temp) CONVERT = converts the data type of a value (perm) ISNULL = returns a specified value if the expression is null, if not null returns the expression ```

Answer 32

INNER = joins only what matches in both tables OUTER = Combination of a left join and a right join FULL(OUTER)= RIGHT = includes everything from the right table and anything that matches in the left LEFT = includes everything from the left table and anything that matches in the right UN|ON = two SELECT queries they union rows SELECT INTO = copies data into from one table into a new table SUBQUERIES: Queries within another SQL query and embedded within the WHERE clause, condition to further restrict EXCEPTION (and outer) help to handle missing data between tables CROSS JOIN - not desired / cartesian / slow performance

Answer 33

server automatically converts the data from one type to another during the query processs

Answer 34

Review to determine Accuracy Completeness Validity

Data Analysis Tools Flashcards

(59 cards)