Exam I Flashcards
bioinformatics
sequence analysis (DNA/RNA)
Stats, bio, cs
Ex/ forcing yeast to evolve drug resistance and sequencing genome to find mutated protein (yeast is eukaryotic)
computational structural biology
protein/ligand structures
Physics, bio, cs, pharmacology
Ex/ drug discovery, predicting structure from sequence (homology modeling)
systems biology
models complex biological networks (metabolic or cell-signaling networks)
Math, stats, bio, cs
Ex/ building mathematical model of the whole cell using protein interactions
epidemiology
disease transmission/patterns/outbreaks
Stats, sociology, bio
Ex/ statistically significant correlation between tick bites and lyme disease
computational neuroscience
simulates brain function and cognition by simulating NS
Neurology, bio, cognitive science, physics, cs
Ex/ how to model neurons to gain insights into brain function
computational ecology
model ecosystem dynamics/disease spread
Ecology, bio, cs
Ex/ create a computer model of disease outbreak within a population
UNIX meaning
Uniplexed Information and Computing Service
unix impact
Foundation of many modern operating systems
Heavy-duty comp bio calculations run on UNIX
HUGE IMPACT → linux, macOS, android OS, and Chrome OS based on UNIX
Most remote servers and supercomputers run Unix/Linux
HISTORICAL → popularized hierarchical (directory-nested) file system
Launched the free software movement
unix historical impact
HISTORICAL → popularized hierarchical (directory-nested) file system
Launched the free software movement
Linux
GUI built on Unix
open operating system used on computers
Used everywhere [OS, top500 supercomputers, 2% desktop computers]
Population distributions: CentOS, Fedora, Linux, Mint, Ubuntu
BASH
Bourne-again shell
Used to interact with Linux/Unix/macOS w/o the GUI
default Unix shell
File system commands streamline directory and file management
– Organized into directories and folders
– Enable file viewing and manipulation
BASH features
Entirely text based (user input commands faster than clicking through GUI)
Able to navigate/view files/directories (files stored in directories (folders))
Able to run executable programs
***speed/control
high performance computing
***specialized systems designed to handle large-scale computational tasks
– Different architecture and capabilities
***parallel-processing with multiple nodes working in concert (100s/1000s connected together via high-speed networks
Why build a HPC system? + examples
For when you need A LOT of computation:
- Weather forecasting/climate modeling
- Protein folding
- Simulating galaxies
- Simulating molecules/proteins
- High-throughput virtual screening (drug discovery)
- Machine learning
Fundamental Architecture of HPCs
HPC systems rely on nodes for login, data transfer, and computation
System of nodes each have a specific purpose/function that affects how you interact with teh system and structure jobs
login node
login via this node (gateway into HPC system)
DO NOT run calculations here!
Shared cluster meant for light tasks
data transfer node
copy data to/from this node
Efficiently moves data in and out of the system
compute node
calculations take place here
Connected via a fast interconnect: high-speed network allowing for rapid communication between nodes
May be grouped into clusters: optimized for different functions, but data is all still accessible
SLURM meaning
Simple Linux Utility for Resource Managemen
HPC job schedulers: SLURM
Job schedulers optimize resource allocation and queue management
How many processors/resources for how long?
Manages multiple users, complex jobs, and limited resources (int between user/comp)
Waiting queue until resources are available
Sophisticated algorithms are used to balance system demands
PBS
Portable Batch System
(job scheduler)
OGE
Open Grid Scheduler
(job scheduler)
Top500 Supercomputers
Ranked biannually in June (EURO) and November (US)
Reflect global competition in super computing excellence
Balance of performance and power consumption
Rmax = maximum performance actually measured
Rpeak = theoretical peak performance
Power = power consumption
serial programming
Task executed sequences
Straightforward; but doesn’t rake advantage of supercomputer abilities