Test 1 Flashcards
What is scientific computing
coding for the purpose of science
consider ven diagram
We want to use coding to:
-Manage large amounts of data
-Apply some math/algorythms (comptute something from data)
-Visualize our data
-Answer really cool questions
What Scientific computing is not
We are not trying to be:
-Computer scientists- solving complex problems with math and computations. Eg cryptocurrency developers
-Software engineers- design and develop user-freindly software. Eg developers of platforms for crytocurrency
Why can’t we just use excel?
-Highly ineffcient- for large amounts of data
-Greater chance of introducing errors
-No permentant record (Code: raw data- update processing- output)
-Not reporducable
How to pick a language
depends on your background, task, goal, ect
big 3: R, python, matlab
R
-Developed fpr statistical computing
-many statistical libraries
-used primarily by researchers
-free to use
Python
-general purpose language
-larger user base- lots of examples and Q/As available
-variety of libraries for data
-free to use
-widespread use across industry
Matlab
-Heavily used in engineering
-Excellent for signal processing
-Standard toolboxes (ie. Libraries)
-Requires a license ($$$)
Integrated Development environements (IDEs)
python can be used in a variety of IDEs
-IDEs allow us to write, edit, and execute pythin code
-IDEs have many features that make it easier to code (eg. Automatic formatting, syntax, highlighting, debugging support, ect) ie spyder and jupyter
Modular prograaming in python
Breaking up larg programming tasks into smaller more manageable ones
3 levels: functions, modules, packages/libraries
Modular programming in python: functions
-Block of code that only runs when you call it
-smallest unit of coding
ie functiong calculating average in excel
-“scatter is a function for creating a 2D scatter plot
Modular programming in python: Modules
-Grouping of functions for similar tasks
-average and STD DV
-pyplot is one module for interactive plotting
Modular programming in python: Packages/Libraries
-Grouping of modules for similar projects
-“matplot lib” is a library for plotting data
Anaconda
-A distribution of the Python (and R) for scientific computing and data science, that aims aims to simply package amanagement and deployment
–One downlaod of 1500+ packages for scientific computing
–User friendly interface to manage it all
–Free to use, with a large user base
Jupyter notebook
-one of the primary IDEs for python programming
-included in the anaconda download
-allows for interactive text, coding, and plotting is an easy-to-use format
Google colab
-A cloud-based platform for writing and executing python code
-hosted on Google, requiring only a google account
-Provides free access to computational resources
Strings
Variables that contains numbers, letters or other characters, but cannot be used in computations
Numbers
Only contains numbers and can be used in computations
Using the stored variables
Stored variables can be used in basic math equations (ie computation of BMI)
Types of number- Float
precision numbers that carry decimal places and as such these are what we would most commonly use for sorting data. Although, sometimes only storing whole numbers without decimals is useful too– convert to interger
Types of numbers: integer
A whole number- converted from a float using the int command
Boolean
only represent one of two values: true or false (ie 1 vs 0). Can be used to evaluate if some content is true. If the content is a single item, most will return true (unless it is false or left blank-also false) Can be useful to determine if data was left blank. Can also be useful in comparing two variables or finding (ie. Indexing) specific rows of data.
Lists
Lists allow us to store multiple items in a single variable. This can be useful if you have a lot of things to store and don’t want a bunch of separate variables. Also, info is stored in a specific order so that it is easy to find and this order (or index) always starts a 0. Lists are defined with [].
Finding Items in Lists
Once have set of dats, important to be able to find the index (postion of item) in the list. We can ass the syntax “index()” after the name of the list with the item we are looking for in the (). We can enter a specific string in this to find a match in the list. Recall the index always starts at 0.
Dictionaries
Similar to list, as can store multiple items in a single variable, but also allows to store “keys” with each paired item. Must specify pairs of “keys” and “values”. Defined with {}
Tips on naming variables
-Applies to strings, interger, lists, dictionaries, ect
-Cannot start with a number
-Cannot use a reserved word as a name (eg. And, def, else, if, while, return)
-Cannot use special symbols
-Do read upper and lowercase characters differently
Best practices when naming variables:
- Never use the letters I or O as single letter names (easily mistaken for 1 and 0)
-Don’t make them too general, but also not so descriptive that they become wordy
-Keep them short, but long enough to be descriptive
-Variable names should be lowercase with underscores used as a separator
accel_x = 1.06
force_z = 10.25
A constant (value that will remain unchanged) can be fully capitalized
GRAVITY = 9.81
What are “control structures”
Blocks of code that analyze variables and decide the direction the program takes. When developing control structures, we need to have an understanding of what is coming in and what is going out.
Preconditions
State of the variables entering control structure
Control structure:
the program or algorithm that runs based on the preconditions
Post condition:
state of the variables after the control structure has run
-Sequence
simply run 1 line after another like a recipe. Most basic form of control structure. Require you to define a step by step process required to complete a task.
Selection
allows for decisions and branching to occur within the block of code (eg if/else statements- most common). A block of code used to determine which steps should be completed based on the input variable/precondition. By doing this we enable to program to make a decision and alter the flow of control- can be binary sections (true (1) vs false (0)) or multi-way selections. * be conscious of indenting
Iteration
used for creating (iterating or looping) lines of code (eg. For loop). Block of code that can be repeated (iterated) a certain number of times (or until a specific condition is met). Most common- loop. Python- loop iterates over the items in a variable or can generate a sequence of numbers using the “range” function. “range” can be used in place of a previously defined list and allows us to define the set of numbers we want the loop to iterate over. Can manipulate the specific values we want to loop over. Variable “message” above the loop allows to add iteratively. Create a temporary variable using the “temp”- can combine the strong info using “+”. Can save new string into the variable created “message” using the “.append” variable- if don’t use .append the temp string variable would get written over during each iteration of the loop.
What are functions?
A block of organized and reusable code that will only run when you call it. Can send it some data (or parameters) and it will run the code, then output something in return. Many functions are built into python.
-abs()- returns the abs value of a number
-len()- returns the length of an object
-max()- returns the largest item in an object
-min()- returns the smallest item in an object
-round()- rounds a number
-sum()- sums the items of a list
Some require only one input and return only one output, many times need to enter multiple inputs and/or it can return multiple outputs.
Create your own function
- Type “def” to tell python we want to create a definition of a function
- Type the name you want the function to be called- be strategic in choice, cant use the same name as a built in
- Use () to define the input for the function (general name to be used within the function). Depending on the function may have no input, 1 input, or many inputs
- Close the definitions line with a colon
Can now define the body of the function that will do the work on the input variable when it is called. Note that this entire body of the function must be indented in python code
Commenting in your functions and code
-use # to comment code
General tips for functions
-Write meaningful comments that summarize block of code, overall processes, or areas that may be unclear or confusing.
-Keep them short and to the point
-Do not simply reiterate the code.
-We can directly reference processing methods or code as a citation or external link in a comment.
-Comments can also be used to list author, date, and other information on the code.
-Documentation strings specialized forms of comments that use three double quotes to open and close your comment and are used to describe the function.
Data
Any information that has been translated into something that we can move, process, visualize, ect. Data- is plural. Storing into in lists and dictionaries can be good way to manage certain types o data- however these varaibles can be limiting when looking to examine larger sets of data.
Arrays
Used to store larger data. More efficient storage than list variable and allows more efficient operations/processing. 2D arrays allow storage of larger sets of data in a variable with rows and columns
NumPy (Numerical Python)
Large open source library for working woth large arrays and matrices. Allows the creation of array objects, rather than simple arrays. These objects are easier to use and manipulate, and faster computationally. Foundational library for which almost all other data science libraries and functions are built. Does not come standard with Python but is included in Anaconda.
Import numpy as np is primarily used for arrays
Note: use () as the base definition for functions, but will often use square brackets inside to separate numbers.
Working with NumPy- Basic operations
Can do many things with arrays of numbers including math (+, -, *, /)
NumPy is primarily used for arrays of numbers but can handle other types of data (eg strings, booleans, ect) as long as they are not mixed in the same array
Working with NumPy- Indexing
Important when working woth arrays is to be able to index specific elements of data and/or find certain values in an array. Important: When indexing in python use (), and always make sure you are using the correct brackets.
To index data, identify the specific element in the array we want, using the [row, col] notation.
-Additionally, if we want all values in specific row or column, we can do so by using : rather than a number to signify all numbers.
-Could also put a series of numbers to represent a subset of data for a row or column.
-Could also index data in an array using comparators or Booleans- generate a Booleans separately and use it to choose specific indices in the data or build it right into the index itself. Ie syntax print(A(>1)) would only print elements where the values was greater than 1.
Pandas (Panel data/ python data analysis)
A data analysis library in python that builds additonal functionality into the NumPy library.
Pandas allow us to combine numbers, strings, Booleans, ect all in the same array if we want. We can also have headers for our columns to make it easier to read data. Panda allows for a cleaner application we need to have a variety of data to work with.
Indexing data or slicing data into subsets (pandas)
Can index using the “loc” and “iloc” functions. These methods can select data based on the specific row of data (“loc”) or the index value (“iloc”). Most of the time they are the same since the index order is in the correct sequence.
Adding more rows, columns, or combining DataFrames using pandas
Used to be the append feature, now use “concat”- this function will look to combine two DataFrames (concatentate-joining together in series)- making new DataFrame?
Importing and working with data using pandas
When working with data, normally expect each observation as a row and each column as a variable. *** confused
To import data from a .csv file we simply need to use the function pd.read_csv with the name of the file.
Plotting data from pandas using matplotlib
Matplotlib offers many different kinds of plots. Use the “.plot” command with x and y data. Define X and Y variables as well as axis titles.