WEEK 2, ch4 Flashcards
data
pieces of information with no context, which is not useful, they are raw facts and are devoid of context or intent
quantitative data
numeric, the result of a measurement, count or some mathematical calculation.
qualitative data
descriptive
information
is PROCESSED DATA that possesses context, relevance, and purpose
Knowledge
in a certain area uses human beliefs or perceptions about relationships among facts or concepts relevant to that area.
explicit knowledge
knowledge that can be expressed into words or numbers
tacit knowledge
insights and intuitions and is difficult to transfer to another person by means of simple communication (riding a bicycle, skiing)
wisdom
when they can combine their knowledge and experience to produce a deeper understanding of a topic.
Database (talk about the relation of the data)
organized collection of related information. It is an organized collection, because in a database, all data is described and associated with other data. All the info should be related as well.
several defects can arise when managing data resources improperly. The three common defects:
Non-control of Redundant Data: unnecessary duplication
Violation of data integrity: accuracy, consistency, and reliability of data throughout its lifecycle
Relying on human memory to store and to search needed data: important business information is not properly recorded or stored in a system, forcing employees to depend on their own memory to recall and retrieve data
Relational database
is one in which data is organized into one or more tables. Each table has a set of fields which define the nature of the data stored in the table
record
is one instance of a set of fields in a table, which can also be thought of as the rows in a table
designing a database (4 steps)
- design team determines which tables to create
- define specific info that each table will hold
- make sure that every table have one field in common with at least one other table (they should have a relationship), for this a PRIMARY KEY must be selected for each table, this is a unique identifier which cannot change.
- identify and make rela between the tables so that you can pull the data together in meaningful ways.
what does normalizing a database mean
reduces duplication of data between tables, gives the table as much flexibility as possible
name the datatypes
text, number, yes/no, date/time, currency, paragraph text (longer text than texts), object
spreadsheet
Allows you to define what kinds of values can be entered into its cells, ideal tool for analyzing the data stored in a database
Structured Query Language (SQL)
programming language that works with relational database. it can be used to analyze and manipulate relational data.
record locking
For a relational database to work properly, it is important that only one person can manipulate a piece of data at a time. But this is hard with large-scale databases, NoSQL comes in handy in this case.
Database Management Systems
create a database, change its structure or simply do an analysis
enterprise database
is a LARGE-scale database system designed to store, manage, and process vast amounts of data for businesses and organizations
enterprise database can be set up in two ways
1.small business might run its database on one computer for a few employees.
2.big company like Amazon needs its database on many servers worldwide so that millions of customers can shop at the same time without slowdowns.
whats the main challenge with relational databases?
scalability
Metadata
can be understood as data about data, meaning it provides information about the structure and properties of data.
example of metadata
Example: If a database has a field “Year of Birth” with a value 1992, the metadata includes:
* Field name: “Year of Birth”
* Data type: Integer
* Last updated: Date and time of last modification
Data dictionary
Special part of a database that stores metadata, defining the structure of tables, fields, and data types used in the database.
Data warehouses
system that collects, stores, and organizes data from multiple databases for analysis and reporting
Data warehouses should be so designed so that it meets the following criteria
- Non-operational data: Stores a copy of data from active business databases, updated on a schedule.
- Time-variant: Data is recorded with timestamps, allowing for historical comparisons.
- Standardized: Data from different sources is converted into a uniform format using the ETL process (Extraction-Transformation-Load).
there are 2 ways to design a data warehouse
- bottom up approach: Starts with smaller data marts (mini data warehouses) that solve specific business problems. These data marts are later integrated into a larger data warehouse
- top down approach: Begins with building large enterprise-wide data warehouse. Smaller data marts are created from it as specific business needs arise
what are the 5 benefits of data warehouse?
- Better understanding of data
- Historical data tracking
- Centralized data view
- Data consistency
- Advanced data analysis
data mining
the process of analyzing data to find previously unknown trends, patterns and associations to make decisions.
what is the problem that arises with data mining
raises privacy concerns as more personal and corporate data is collected and analyzed.
data brokers
collect publicly available and government data, combine it with other sources, and sell it, raising ethical concerns.
Business intelligence
refers to the process of collecting and analyzing data to gain a competitive advantage.
Business analytics
focuses on using internal company data to optimize processes and improve decision-making.
Knowledge management (KM)
is the practice of capturing, organizing, and storing company knowledge. This ensures that valuable insights and expertise are documented rather than being lost when employees leave.
tulips
cannot change data and use ()
Who is shared and who gets privacy
ledger shared but participants privacy
can people see transactions with blockchain
no
advantages of proof of work (poW)
- strong competition: increases security
- miners get rewards
- decentralized
- high security: difficult to hack or cheat
disadvantages of proof of work (poW)
- expensive equipment needed
- high energy usage
- slow transactions
- high fees
advantages of PoS
- no expensive equipment
- faster transaction
- energy efficient
disadvantages of PoS
- coin hoarding (rich have more to say)
- not fully tested at large scale
- big investors have more control
- requires a big initial investment
when is blockchain NOT a good choice for a business
- not fast enough for systems that need transactions to be completed in milliseconds
2.useful for big networks, not small businesses - is NOT the same as a traditional database.
4.NOT meant for messaging or communication
5.not a better version of traditional payment systems (Traditional payment processing (like credit card systems) is faster and more efficient than blockchain for everyday transactions).