Lecture 1 - Software for Statistics Flashcards
Central Processing Unit
Controls the computer and executes instructions
Different types of memory
Registers - within the CPU, superfast
Random Access Memory - slower
Mass storage - hard disk drives etc, much slower
How are programming languages classified ?
Level of abstraction (low/high)
Speed/Efficiency
Generality
Generations
What is difference between low level and high level language?
Low level - close to specific hardware
High level - far from the hardware, closer to natural language
What type of language (lower/higher) is faster?
Lower languages tend to be faster but this is not always true
1st Generation
Machine language - CPU specific set of instructions.
Relatively few instructions and many of these required to achieve anything useful
2nd Generation
Assembly language
Human readable version of machine language
3rd Generation
More human friendly, CPU independent language with variables, data, and code structures.
Also object oriented languages such as C++ and Java.
4th Generation
Language designed with a specific application in mind.
Lots of built in capabilities for that application.
Database query languages (SQL), graphical user interface creators, mathematics languages, statistic languages.
5th Generation
Language based on solving problems given the problem specifications.
User does not need to explicitly write the algorithm for solving the problem
Interpreted Language
Turns each line into machine code as it is entered using an interpreter and run it straight away.
Compiled Language
Once code is written and saved into a file, turn it all into machine code in one go using a compiler. Then it can be run.
Difference between Interpreted and Compiled coding languages
Interpreted codes provides instant feedback - good for run once short jobs. Tend to be used by 4GL.
Compiled code runs faster - good for jobs that will run many times. Tend to be used by 3GL
What programming language for stats?
For simple run once stuff - favourite stats package. (CON: no reproducile trail)
More complex stuff to be run once or few times - 4GL with an interpreter
Production software or where efficiency is important - 3GL and compile or prototype in 4GL then rewrite the slow bits in 3GL.
Software for enormous data sets
SAS/ Microsoft R open/ R with Hadoop
Pros and Cons of R
Pros:
Contains cutting edge methods
Highly extensible
Free
Cons:
Steep learning curve
Less well supported than a commercial package
Greater tendency to ignore backwards computability.
What is an algorithm?
An ordered sequence of unambiguous and well defined instructions for performing some task and halting in finite time
Important features of an algorithm
An ordered sequence
Unambiguous and well defined instructions, each instruction is clear, do-able, and can be done without difficulty
Performs some task algorithm needs to be complete, with nothing left out
Halts in finite time
Phase 1 of an R programmer
R as a calculator
Phase 2 of an R programmer
The script
10-15 lines of code strung together to perform a task
Might have a loop, maybe even an if statement.
One or two comments
Not intended for the use of others
Phase 3 of an R programmer
Modular programming with functions
Undertaking a sufficiently complex analysis that organisation becomes critical
Description of input and outputs are critical to ensure reusability
When performing functional programming there are elements of R language that become useful (e.g. stop( ) )