MSCDSA01 - Foundation Flashcards
Foundational Data Science - Inc lots of Python
What is Tkinter and what is it used for?
A python module, for building a graphical interface
What is SQLite and what is it used for?
(an alternative to writing to a text file)
What is ‘pdb’ and what is it used for?
which can be used to find elusive logic errors
What are the string type identifiers in Python?
you can use single or double quotes, either is fine - ‘ or “
what does the ‘' character inside a string do?
The \ tells python to expect a number or character afterwards which will indicate printing a special character \ = newline \\ = \ \' = ' \" = " \a = ASCII Bell(BEL) \b = ASCII Backspace (BS) \f = ASCII Formfeed (FF) \n = ASCII Linefeed (LF)(newline) ..etc
How do you ‘print’ to the console in python?
print ( )
eg:
print (‘Hello World’)
——
or print ("Hello World") but NOT Print ('Hello World') = capital Would not be recognised (case sensitive) and whitespace means something and is not ignored)
What is ‘Spyder’ and what is it used for?
Spyder is an open source integrated development environment (IDE) for scientific programming in the Python language
What does ‘Syntax’ mean in programming?
the rules (like grammar in written language) define combinations of symbols to make a correctly structured expression. --------
Applies both for programming languages, (source code), and for markup languages, where the document represents data.
What is a ‘markup language’?
from the “marking up” of paper manuscripts; replaced digitally by tags, indicating what the elements of text ARE, rather than how they might be shown on a display
Is python a ‘compiled’ or an ‘interpreted’ language, and what is the difference?
It has elements of a ‘compiled’ language, having compilers available
What are the two modes of entering and running programs in Python?
Interactive mode = type instructions and python responds immediately
Script mode = type in and SAVE
In ‘Jupyter’ what is the key combination to run a python program?
alt+enter or ctrl+enter or cmd+enter
In language theory and programming what is the term for joining character strings end-to-end?
For example, the concatenation of “snow” and “ball” is “snowball”
How would you put single or double quotes inside a string in python?
eg if you want a “ inside the string, then encapsulate (define) the string using the ‘ (single quote) character
What symbol is used to concatenate strings in Python
+
so “A” + “B” would produce AB
What are the the numeric data types in python?
int = integer (a whole number) float = floating point (a number with a decimal point)
what are the standard arithmetic operators in python?
\+, -, * (multiply) and / (divide) ** for exponentiation (eg 2 ** 3 = 8 = 2^3) % (mod) returns the remainder when an integer is divided by another (11 % 5 = 1) // (div) performs integer division (11 // 5 = 2)
In python when does numerical division result in a floating point number?
use ‘//’ to get an integer without remainder or decimal
In python which is the divide symbol ‘ \ ‘ or ‘ / ‘
remember your Christmas tree, it was the right hand slope ‘ \ ‘ that gave the problem because its an escape sequence and you needed ‘\' to get \
In Python why does 1.62+0.53 = 2.1500000000000004 how to avoid it?
To be certain of out put use the round() function:
round(1.62+0.53,2)
What is the TRUE/FALSE data type called?
Boolean
What are the true false answers to the following: 9 == 3*3 9 != 3*3 9 == 4*4 9 != 4*4 (5 > 1) and (7 > 1) (1 > 5) and (7 > 1) (1 > 5) or (7 > 1)
TRUE FALSE FALSE TRUE TRUE FALSE TRUE
In Python what would the following evaluate to:
9 != 4*4
TRUE
In Python what would the following evaluate to:
1 > 5) and (7 > 1
FALSE
In Python what are the 3 standard convention for naming variables?
1) make the names meaningful.
2) start with a lower case letter
3) use “camel caps” to separate parts of a name
eg. highScore
In Python what is the standard convention for naming constants?
1) make the name meaningful.
2) use ALLCAPS
eg. VATRATE
What is an assignment operator in Python?
It is how you assign a value to an object, e.g .
This is done using the ‘=’ operator.
length = 4 means the variable name (left) has the value (right) assigned to it
In python what is an ‘augmented assignment operator’
A way to update a variable e.g.
score +=1
is equivalent to
You can NOT use one if the variable has not yet been defined
what do the following ‘augmented assignment operators’ do?
A) +=
B) -=
C) *=
D) /=
E) %=
F) //=
variableA ‘+=’ rhValue
A) add the right hand value or variable to variableA
B) subtract the right hand value or variable from variableA
C) multiply variableA by the right hand value or variable and replace variableA with the answer
D) divide variableA by the right hand value or variable and replace variableA with the answer
E) mod (divide) variableA by the right hand value or variable and replace variableA with the REMAINDER
F) div (integer divide) variableA by the right hand value or variable and replace variableA with the answer (which discards the remainder)
In Python’s print statement, how do you combine strings and variables into a single print line?
print (“price is £”, totalPrice, “ exactly”
in Python IDEs is the £ sign always handled correctly?
In some IDE’s it is and in some it is not recognised in a string and will cause the program to crash
In Python print command, can you use a ‘ + ‘ to join strings or variables?
so you first have to convert any real numbers to strings using the str(n) function
in python, when separating strings using ‘ , ‘ how can you eliminate the extra space that appears between a string and a number e.g when you get “.. £ 15.00” but you want “.. £15.00”?
Use the ‘ + ‘ and convert the number to a string using the str(n) function
In Python what function converts a number to a string?
str(n)
What is the name for the method(s) that allow you to put special characters into strings?
Escape sequences
e.g \n skips to a new line
How do insert a ‘tab’ into a string (to form columns for instance)
use the escape sequence ‘ \t ‘
In Python, what operator allows two print statements on different lines to be printed on the same line?
add ‘ , end = “ “ ‘ to the end of the print function (inside the brackets. e,g:
print (“this all prints “, end = “ “)
print (“on one line”)
In Python, what does triple quote (single or double quotes) do? - ‘’’ or “””
retains the formatting of the string e.g. over several lines eg: print (""" John Smith 23 Elm Street Andover """) would retain the 3 lines as distinct 3 lines John Smith 23 Elm Street Andover
In Python describe a way to take a value from the program user and assign it to a variable in the program
It can contain the on screen prompt or that can be provided by a print statement. e.g:
firstName = input(“Please enter your first name: “)
print(“Please enter your surname: “)
surname = input()
print (“Your full name is”, firstName, surname)
In Python, what is the file extension for a standard python script source file (as used in script mode)
.py
When commenting a program at the top, what are the 5 elements it is good practice to add, and what symbol is used to note a comment?
1) Program Name
2) Program Purpose
3) Programmers Name
4) Date the program was written
5) The folder / location the source code is saved in.
- ————
Comments are marked by prefixing with a # (alt 3 on the mac keyboard)
In Python, how would you keep the console open (for debugging etc) when the program has completed, when the program has been launched by double-clicking the file in finder/explorer
If you add an input right at the end of the program it will wait for user input before exiting. e.g:
input(“\nPress Enter to exit: “)
In Python, what is an Object?
Python is an object oriented programming language.
In Python, what is a class?
A Class is like an object constructor, or a “blueprint” for creating objects.
in Python what string methods would perform the following:
A) convert a string to UPPER case
B) convert a string to lower case
C) find the first occurrence of a sub string inside a string or an ERROR if not found
D) find the first occurrence of a sub string inside a string or a ‘ -1 ‘ if not found
E) replace all occurrences of a sub string with another, within a string
A) myString.upper()
B) myString.lower()
C) myString.index(‘a..’)
D) myString.find(‘a..’)
E) myString.replace(‘a..’, ‘z..’)
In Python, when using the variable = input() assignment function, if the user enters 1.00 what data type is this stored as?
A string. The input function only deals in strings. You need to use float(x) or int(x) to convert the string into a number.
In Python, how can you use input() to assign the value as a float or int?
encapsulate the input() function within the float() or int() function e.g:
variable1 = float(input(“give me a number”))
In Python, what is the difference calling / using a method vs a function?
A method is written using ‘ . ‘ ‘dot’ notation, with the name of the object followed by a dot and then the method name e.g:
postionOfA = myVariable.find(“A”)
A function has the function name first, followed by the name of the object in parentheses. e.g. the ‘len’ function:
lengthOfString = len(myVariable)
What are the three basic programming constructs for controlling the order in which programming statements are carried out?
Sequential (B follows A etc)
Selective (If A then X if B then Y)
Iteration
In Python, what clause is used when there are multiple routes through the program flow
’ elif ‘ in place of the else, except the final else.
In Python are all elifs tested?
If an If or elif is true then all subsequent tests are skipped
What is a way of testing multiple conditions? (in if / else)
By using nested if statements if a = true: if b = true: "true" else: "false" else "try again"
three boolean expressions are..?
AND (true if both conditions are true)
OR (true if either or both conditions are true)
NOT (true expression becomes false and vice versa)
In Python, how do you use a function contained in a standard library module?
load the module by using the ‘ import ‘ function to import/load the module
In Python, how do you obtain a random integer?
1) import random
2) use the random method random.randint
3) allocate upper and lower ranges for the integer: myRandom = random.randint(n,n1)
Define the difference between data and information?
Data is NOT information
’ Name John ‘ is Data
‘ His Name is John ‘ is Information
Provide a definition for information theory?
The reduction of uncertainty
What do we use, in order to understand data and create information?
Applying analysis produces value from the data
Data can be large and small
What are the 4 v’s of Big Data?
1) Volume
2) Variety
3) Velocity
4a) Value
4b) Veracity
- ——
Volume refers to the amount of data, variety refers to the number of types of data and velocity refers to the speed of data processing
What are the 3 types of business Analytics?
1) Descriptive Analytics= ‘What Happened?’
2) Predictive Analytics = ‘What is LIKELY to happen?’
3) Prescriptive Analytics = ‘The best possible decision is..?
- ———–
1) Descriptive = usually data visualisation and descriptive stats
2) Predictive Analytics =
a. Data Mining
b. Text / Web mining
c. Forecasting (i.e. Time series)
3) Prescriptive Analytics =
a. Optimisation - global maxima or minima
b. Simulation
c. Multi-Criteria Decision Modelling
d. Heuristic Programming
What is the definition of a ‘ system ‘?
Input»_space; Processing»_space; Output»_space;
^^««_space;Feedback ««
What is a definition for Data Science
Science that creates data products, using:
1) computer science,
2) statistics,
3) machine learning,
4) visualisation
5) human-computer interaction (HCI)
- ———
1) collects data,
2) cleans data,
3) integrates data,
4) analyses data,
5) visualises data,
6) interacts (HCI) with data
What is Business Intelligence?
What are tree aspects of Business Intelligence?
1) Data - lots of data and sources. get it all together to analyse
2) Analyse - track benchmarks to measure performance. Effective visualisation is key
3) Insight - trends, correlations and comparisons.
1) Data Science
2) Big Data
3) Data Analytics.
Characterise key purpose of each
1) Data Science = Mining structured and unstructured data
2) Big Data = Humongous volumes of data.
3) Data Analytics = Analysis to draw conclusions and solve problems
- —–
Data Science = Mining to identify patterns. Includes programming, Statistical, Machine Learning and algorithms
Big Data = Capture, store, share and query data
Data Analytics = Process and perform statistical analysis on data. Discover how data can be used to draw conclusions and solve problems
What does a Data Scientist do?
3 Core things:
1) Predicts the future based on past patterns
2) Explores data from multiple disconnected sources
3) Develop new analytical methods and ML models
What does a Big Data Professional do?
3 Core things:
1) Builds large scale data processing systems
2) Designs scalable distributed systems
3) Analyses system Bottlenecks
What does a Data Analyst do?
3 Core things:
1) Gets, processes and summarises data
2) Packages the data for insight
3) Design and create data reports
What does ANOVA mean?, and what is it used for?
Analysis of Variance (ANOVA)
Looks for statistically significant differences between the means of three or more independent (unrelated) groups.
In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means.
In “The Unreasonable Effectiveness of Data” Peter Norvig showed what?
That simple models with LOTS of data outperformed complex models
How is value discovered from data?
By applying Analytics
What is a Tuple?
Tuples cannot be changed (unlike lists) and tuples use parentheses, whereas lists use square brackets.
Creating a tuple is as simple as putting different comma-separated values. Optionally you can put these comma-separated values between parentheses also
A tuple lets us “chunk” together related information and use it as a single thing.
There is no description of what each field means, but we can guess.
Like other indexed items (lists, vectors, arrays all of which hold one type of information) but tuples hold multi-type)
In Python how do you access time functions?
load the module ‘ time ‘ by using:
import time
access time by using dot notation:
myCurrentTimeVariable = time.time() # which would produce seconds since January 1st 1970
In Python how do you produce a nicely formatted date and time output
By using the ctime method of the time function
niceFormatTimeVar = time.ctime(myCurrentTimeVariable)
In Python how can you print a float to a certain number of decimal spaces?
using in the string the escape sequence:
“… %.2f” % myFloatVariable)
e.g. print(“rounded to 2 %.2f” % myFloatVariable)
In Programming name two types of loops
for loops: (definite iteration)
while loops: (indefinite iteration)
In a Python ‘ for ‘ loop you set a start and finish range; how many numbers would the following iterate? (1,6)
1, 2, 3, 4, 5
the range will start with the first number and iterate up to BUT NOT INCLUDING the second number
In a Python ‘ for ‘ loop how can you increment the counter by any integer value? e.g (1,16)
by adding a third value into the function e.g. (1, 16, 3) would count in 3’s
1 4 7 10 13 16
(10, 0, -2) would count down from 10 in -2’s
In Python how can you pause a program for 1 second?
using the sleep method of the time function from the time module:
import time
time.sleep(1)
How is a while loop controlled?
it will loop whilst a boolean condition is TRUE
If the condition is FALSE the loop will be skipped
In Python, what would be difference in creation of:
myVariable = [1,2,3,4]
or
mVariable = (1,2,3,4)
square brackets tell python you are creating a LIST
round brackets tell python you are creating a TUPLE
In Python, what collection type do the following braces represent:
( parentheses )
[ square brackets ]
{ curly braces }
( p ) is a tuple:
[ sq ] is a list:
{ crl } is a dict
( p ) is a tuple: An immutable collection of values, usually (but not necessarily) of different types.
[ sq ] is a list: A mutable collection of values, usually (but not necessarily) of the same type.
{ crl } is a dict: Use a dictionary for key value pairs.
What measurement is used to calculate the degree to which points cluster along a straight line linear regression?
Pearson Correlation Coefficient - R
if the correlation of N zx and zy scores is perfect, what will the sum of N zx*zy be..?
If the correlation is perfect then zx * zy is = to zx squared (or zy squared) and the sum of those products will equal N. becasue the correlation is perfect.. Hence the product / N shows the strength of the correlation (1 = perfect, 0 = none, >.5 is strong positive correlation but not perfect) .
This is the Pearson Correlation coefficient r
what is the ‘end’ function for? will the following code produce:
print(‘one’, end=’ ‘)
print(‘two’, end=’ * ‘)
print(‘three’)
one two * three
end will join the print statements and the * will be displayed
what is the ‘sep’ function for? What will the following code produce:
print(‘one’,’two’,’three’, sep=’*’)
onetwothree
sep will separate the strings using whatever is in the sep quotes
what would the following produce:
myVar = 1234.12345
format(myVar, ‘.2f’)
1234.12
what would the following produce:
myVar = 1234.12345
format(myVar, ‘.3e’)
1.234e+03
when opening a file in python, what are the single character options for ‘x’ and what do they do?
myFile = open(“filename.txt” , “x” )
Open the file different ‘modes’
“r” for Read
“w” for Write
“a” for Append
How variables can be compared by human visualisation abilities ?
Tops out at 3 according to field caddy