G.1 General Flashcards
- Define the notion of “Datification”? In which way is it a revolution with respect to smart environments?
Data is everything and everything is data. Datafication is to render into data all the aspects of the world, even those that have never been quantified. It is revolutionary because for the first time is open to individuals and companies, making most of the crutial algorithms and evidence-based decision available.
- Define the characteristics of data centric sciences? What is the role of data for them? What are the two components that make them a new generation of experimental sciences?
Data management, greedy algorithms and programming models to be deployed in different target computer architectures.
Data collections as backbone for conducting experiments, drive hypothesis and lead to “valid” conclusions, models, simulations, understanding.
The two components: Big Data handling and architecture
- Define the notion of Big Data. In your opinion how does this notion opens new challenges to data management?
“Big Data” refers to datasets that are so large, complex, and dynamic that traditional data processing systems have difficulty efficiently handling and analyzing them. That’s why we refer to the V’s. There are three aspects to consider when talking about big data and those are:
* Data collection – characteristics difficult to process on single machines or traditional DBs.
* New generations of tools, methods and technologies to collect, process and analyze massive data collections.
* Tools imposing the use of parallel processing and distributed storage.
Give 5 properties that characterise Big Data? Explain in which way they are challenging for managing data?
Properties - Challenges:
1. Volume – quantity of collected and generated data
2. Veracity – quality and reliability of the data, that they really reflect the natural state.
3. Variety - diverse types of data and formats
4. Variability – changes in the structure and format of data
5. Value – that it can be turned into valuable insights.
6. Velocity – data production rate
These challenges require a combination of technological solutions, organizational strategies, and data management best practices.
In the case of your domain of expertise, how does Big Data open novel possibilities or problems/challenges?
Big Data opens exciting possibilities for advancing natural language processing and machine learning capabilities, it also brings forth challenges related to data quality, bias, computational resources, privacy, interpretability, and ethical considerations.
What are the type of questions that data analytics can answer that cannot be answered by classic factual querying?
The type of questions that data analytics can answer that cannot be answered by classic factual querying are the ones in prediction and classification nature.
Programming language used to manage and manipulate data stored in Relational Database Management Systems (RDBMS). Allowing operations such as creating and modifying tables, inserting, updating, and deleting data, as well as retrieving information through queries
SQL (Structured Query Language)
Refers to a set of database technologies that depart from the relational model. …. this Databases are designed to handle large volumes of unstructured or semi-structured data and provide flexibility in the data schema.
NoSQL (Not Only SQL)
In terms of Big Data Volume, which is the basic unit of measure?
1B = 1 Byte
In terms of Big Data Volume, what does a Gigabyte represent?
1x10´9 Bytes
In terms of Big Data Volume, which is the highest capacity achieved lately?
1x10´24 Bytes = Yottabyte. Brontotype and Geaobyte en términos de 1000 de uno a otro
Social Data, Network Science, Digital Humanities and Computational Science are…
Data centric science that emerged and developed new methodologies to explode data.
Develop Methodologies weaving data management, greedy algorithms and programming models that must be turned to be deployed in different target computer architectures…
Data centric science
Its hypothesis is that any complex system can be represented as a net so operations can be applied and by ende modeled, dupledixs. Graphic theory is applied.
Network science
Now days we have many algortihms and mathematical model but we need new ones. The numerical models are less costly than the new solutions now days (A.I) but when there is no solution we should go to the second option. Here we are talking about
Computation
When is about ………… we don’t work on a lab we work on computer support. That’s why we need to clearly define the architecture computing environment (cycles) to carry it out.
Experiment setting
Which are the Big Data properties?
Volume, Velocity, Varity, Variability, Veracity, Value
The Big Data property that refers to QUANTITY / SIZE
Volume
The Big Data property that refers to PRODUCTION RATE
Velocity
The Big Data property that refers to DIVERSITY OF DATA TYPE AND FORMAT (heterogenicity)
Variety
The Big Data property that refers to the QUALITY OF THE DATA
Verecity
The Big Data property that refers to CHANGES IN THE STRUCTURE AND FORMAT OF DATA
Variability
The Big Data property that refers to the ABILITY TO TURN DATA INTO USEFUL INSIGHTS
Value
What are three aspects that we can say about Big Data
- Data collections with characteristics difficult to process on single machines or traditional databases.
- A new generation of tools, methods, and technologies to collect, process and analyze massive data collections.
- Tools imposing the use of parallel processing and distributed storage.