Lecture 1 - Software for Statistics Flashcards
Central Processing Unit
Controls the computer and executes instructions
Different types of memory
Registers - within the CPU, superfast
Random Access Memory - slower
Mass storage - hard disk drives etc, much slower
How are programming languages classified ?
Level of abstraction (low/high)
Speed/Efficiency
Generality
Generations
What is difference between low level and high level language?
Low level - close to specific hardware
High level - far from the hardware, closer to natural language
What type of language (lower/higher) is faster?
Lower languages tend to be faster but this is not always true
1st Generation
Machine language - CPU specific set of instructions.
Relatively few instructions and many of these required to achieve anything useful
2nd Generation
Assembly language
Human readable version of machine language
3rd Generation
More human friendly, CPU independent language with variables, data, and code structures.
Also object oriented languages such as C++ and Java.
4th Generation
Language designed with a specific application in mind.
Lots of built in capabilities for that application.
Database query languages (SQL), graphical user interface creators, mathematics languages, statistic languages.
5th Generation
Language based on solving problems given the problem specifications.
User does not need to explicitly write the algorithm for solving the problem
Interpreted Language
Turns each line into machine code as it is entered using an interpreter and run it straight away.
Compiled Language
Once code is written and saved into a file, turn it all into machine code in one go using a compiler. Then it can be run.
Difference between Interpreted and Compiled coding languages
Interpreted codes provides instant feedback - good for run once short jobs. Tend to be used by 4GL.
Compiled code runs faster - good for jobs that will run many times. Tend to be used by 3GL
What programming language for stats?
For simple run once stuff - favourite stats package. (CON: no reproducile trail)
More complex stuff to be run once or few times - 4GL with an interpreter
Production software or where efficiency is important - 3GL and compile or prototype in 4GL then rewrite the slow bits in 3GL.
Software for enormous data sets
SAS/ Microsoft R open/ R with Hadoop