lesson 3 Flashcards
You want to store the entire works of William Shakespeare (about 836’000
words) as a text file. How large is it going to be?
836000 * 6 * 1 = 5016000 Bytes ~ 5MB
Decode the “secret message” from standard ASCII.
87 69 76 76 32 68 79 78 69
WELL DONE
You want to encode the variable DAY OF WEEK (Monday, Tuesday, . . . ,
Sunday) as efficiently as possible. How many Bits do you need for this variable?
7 days of the week -> log_2(7) = ln(7) / ln(2) = 2,8 –> 3Bits
YYou obtain a german text file. The first line reads like this:
What went probably wrong and how can you fix it?
(1) we are using the wrong ASCII table
(2) reopen using correct ASCII table –> convert to unicod
The Nikon D7500 digital camera has a sensor that captures approximately
6000 x 4000 pixels.
(a) What is the size in Kilobytes of an uncompressed photo of the Nikon D7500?
(b) In what efficient file format would your store the photo?
(c) What size can you expect the file to be in the efficient format?
a) 6000 * 4000 * 3 = 72000000 ~ 72MB
b)jpeg
c) 72MB / 10 = 7,2MB
You design a database of the OHLC (open/high/low/close) prices of all stocks
that are traded in the US.
(a) Write down all fields and SQL data types.
(b) Calcualte the size in Bytes of one record.
(c) Estimate the total size of your database, making reasonable assumptions.
Note: clearly state your assumptions. You don’t have to justify them.
(d) How can you make the database more efficient and/or use less storage? State two
possible measures.
a)
open DOUBLE 8 Bytes
high DOUBLE 8
low DOUBLE 8
close DOUBLE 8
ISIN (symbols) CHAR (12) 12
date DATE 3
b)
tot. 47 bytes
c)
number of stocks traded: 4500
number of days in a year: 252
number of years in the Data Base: 20
47 * 4500 * 252 * 20 = 1 065 960 000 ~ 1GB
d)
DOUBLE -> FLOAT
only store “close”
transparent compression
examples of Slowly-changing data?
Stock variables, contracts, Industry association
examples of Fast-changing data?
Flow variables, prices, assets,
what are Derived quantities? (+ examples)
Anything that is calculated from above quantities.
Mostly (but not always) quotients
es :
GDP per capita
GDP per capita in USD
what are Q-quantities? (+ examples)
are priced measure.
Anything that prices future utility
es:
- Bonds (time value of money + infation premium)
- Stocks (time value of money + equity premium)
- Derivatives
what are P-quantities?
different from Stocks and Flows (+ examples)
P-quantities: Anything that is countable or (physically) measurable
STOCKS: (storage of things).
number of objects or quantity of material
Number of employees, clients, Size of a plot of land, debt
FLOWS: number/quantity per unit (of time)
GDP, deficit, turnover, trade volume, volatility (as quantity of risk), energy consumption (per year), products (e.g. cars)
produced (per year)
Features of a Data base
min 3
Everything in one place
– Obvious structure for most data
– Central pillar of data workflow
– Easily share data and collaborate
– Easily create subsets
SQL cods
SELECT <fields></fields>
FROM <d’base>
WHERE <conditions> ORDER BY <field></field></conditions>
SELECT COUNT(*) FROM opt; A function: number of records.
SELECT MIN(date), MAX(date) FROM opt; First and last date.
SELECT DISTINCT date FROM opt; List of all different dates.
SELECT COUNT(DISTINCT date) FROM opt; How many different trade dates?
INSERT
SHOW
USE
how a database is made of?
Database
– TABLE
Possibly several, can be linked
⊲ FIELD
– One variable → distinct type
– “colums”
⊲ RECORD
– One observation (individual)
– “Row”
goals of relational database
□ Avoid duplication
□ Avoid inconsistency
□ (Increase efficiency)