Full Study Flashcards
What is Data Analytics?
The process of extracting useful insights from raw data
What is Data Analysis?
Data analysis refers to the process of compiling and analysing data to support decision making.
Compare Data Analytics and Data Analysis?
The difference between data analysis and data analytics is that data analytics is a broader term of which data analysis forms a subcomponent.
Data analytics also includes the tools and techniques used to do so.
What is Big Data?
Data that is high volume, velocity, variety, and veracity. Data that comes from multiple sources. Unlocking the value of Big Data allows business to better sense and respond to the environment. A key to creating competitive advantages in a complex and rapidly changing market. Government also taking notice of the Big Data phenomenon. Traditional data processing and analysis of structured data using RDBMS and data warehousing no longer satisfy the challenges of Big Data. Data is created constantly, and at an ever-increasing rate. Mobile phones, social media, imaging technologies to determine a medical diagnosis-all these and more create new data, and that must be stored somewhere for some purpose.
What are the Technology Trends in Big Data?
open-source software,
commodity servers,
massively parallel-distributed processing platforms.
What are the challenges of Big Data?
Data at Rest – Terabytes to exabytes of existing data to process.
Data in Motion – Streaming data, requiring seconds to respond.
Data in Many Forms – Structured, Unstructured, Text, Multimedia.
What are the Characteristics of Big Data?
Huge volume of data:
Big Data reflects the variety of new data sources, formats, and structures,
Velocity of new data creation and growth: Big Data can describe high velocity data, with rapid data ingestion and near real time analysis.
Where does Big Data get the information?
Mobile Devices
Social Media and Networks
Scientific Instruments
Sensor Technology and Networks
What are the Drivers of Big Data?
Medical information, such as genomic sequencing and diagnostic imaging
Photos and video footage uploaded to the World Wide Web.
Video surveillance, such as the thousands of video cameras spread across a city.
Mobile devices, which provide geospatial location data of the users, as well as metadata about text messages, phone calls, and application usage on smart phones.
Smart devices, which provide sensor-based collection of information from smart electric grids, smart buildings, and many other public and industry infrastructures.
Non-traditional IT devices, including the use of radio-frequency identification (RFID) readers, GPS navigation systems, and seismic processing
What is Data?
The number and rate of data produced in any particular’ discipline now exceed our ability to effectively treat and analyze them.
What are some Data Sources?
digital instruments , high resolution cameras , medical scanners , simulations , transactional data , social media
What are the main players in the Big Data ecosystem?
Data devices and the “Sensor net” gather data from multiple locations and continuously generate new data about the is data.
Data collectors include sample entities that collect data from the device and users.
Data aggregators make sense of the data collected from the various entities from the “Sensor Net” or the “Internet of Things These organizations compile data from the devices and usage patterns collected by government agencies, retail stores and websites. ln turn, they can choose to transform and package the data as products to sell to list brokers, who may want to generate marketing lists of people who may be good targets for specific ad campaigns.
Data users and buyers are denoted by (4) These groups directly benefit from the data collected and aggregated by others within the data value chain.
What are the Key Roles in the Big Data eco system?
Deep Analytical Talent - technically savvy, with strong analytical skills. This group has advanced training in quantitative disciplines, such as mathematics, statistics, and machine learning. To do their jobs, members need access to a robust analytic sandbox or workspace where they can perform large-scale analytical data experiments.
Data Savvy Professionals - Has less technical depth but has a basic knowledge of statistics or machine learning and can define key questions that can be answered using advanced analytics. These people tend to have a base knowledge of working with data, or an appreciation for some of the work being performed by data scientists and others with deep analytical talent.
Technology and Data Enablers - This group represents people providing technical expertise to support analytical projects, such as provisioning and administrating analytical sandboxes, and managing large-scale data architectures that enable widespread analytics within companies and other organizations.
role requires skills related to computer engineering, programming, and database administration.
What are the Activity’s performed by Data Scientists?
Reframe business challenges as analytics challenges - Diagnose business problems, consider the core of a given problem, and determine which kinds of candidate analytical methods can be applied to solve it.
Design, implement, and deploy statistical models and data mining techniques on Big Data - Applying complex or advanced analytical methods to a variety of business problems using data.
Develop insights that lead to actionable recommendations - Draw insights out of the data and communicate them effectively.
What are the skills and behavioral characteristics of a data scientist?
Quantitative skill: such as mathematics or statistics
Technical aptitude: namely, software engineering, machine learning, and programming skills
Skeptical mind-set and critical thinking: It’s important that data scientists can examine their work critically rather than in a one-sided way.
Curious and creative: Data scientists are passionate about data and finding creative ways to solve problems and portray information.
Communicative and collaborative: Data scientists must be able to articulate the business value in a clear way and collaboratively work with other groups, including project sponsors and key stakeholders.
What is the profile of a data scientist?
1) Quantitative
2) Curious and Creative
3) Skeptical
4) Technical
5) Communicative and Collaborative
What are the types of Data Structures?
Structured data: Data containing a defined data type, format, and structure.
Semi-Structures Data: Textual data files with a discernible pattern that enables parsing.
Quasi-structured data:
Textual data with erratic data formats that can be formatted with effort and tools.
Unstructured Data: Data that has no inherent structure.
What are the Disadvantages of EDW?
EDW - Enterprise Data Warehouse
EDWs and BI, systems tend to restrict the flexibility needed to perform robust or exploratory data analysis.
With the EDW model, data is managed and controlled by IT groups and database administrators (DBAs), and data analysts must depend on IT for access and changes to the data schemas. Imposing major lead time.
DW rules may restrict analysts from building datasets.
EDW and BI introduce new problems related to flexibility and agility, which were less pronounced when dealing with spreadsheets.
A solution to this problem is the analytic sandbox, which attempts to resolve the conflict for analysts and data scientists with EDW and more formally managed corporate data.
What is a Sandbox?
Sandboxes, often referred to as workspaces, are designed to enable teams to explore many datasets in a controlled fashion and are not typically used for enterprise level financial reporting and sales dashboards. Many times, analytic sandboxes enable high-performance computing using in-database processing the analytics occur within the database itself.
What are types of Data Repositories?
Spreadsheets and data marts (“spreadmarts”):Spreadsheets and low-volume databases for record keeping.
Data Warehouses: Centralized data containers in a purpose-built space Supports Bl and reporting but restricts robust analyses.