Computer Science Theory Flashcards
Pre-defined data types
are data types which have already been defined by language in which you are coding. E.g. Bool and integer. Meaning there is a pre-defined range of possible values
User-defined data types
are data types whose nature is defined by the user (the programmer). Meaning that there is a specific range unique to that program
Composite data types
are user defined data types that are built on other datatypes, defined data types (such as strings and integers). Examples include sets, classes, arrays, etc. Think of the word composite (joint, more than one, i.e., composite function)
Non-composite data types
are user-defined data types that do not reference any other data types
Sets
are given lists of unordered elements that can use set theory operations such as intersection and union. These are defined in a similar way to arrays, they just have normal parentheses and can hold both strings and numbers.
Classes
are composite data types that include variables of given data types and methods (methods are code routines that can be run by an object in that class).
Enumerated data type
Enumerated data type: a user defined, non-composite, data type that is ordinal, meaning that they have an implied order of values. i.e., months, in the year
Pointer data type
references memory location and needs to have information about the type of data that will be stored in the memory. These do not reference any data type though.
Text file
contains data stored according to a defined character code defined by ASCII or Unicode. This file can be created using a text editor
Binary file
is a file designed for storing data to be used by a computer program (1s and 0s). It stored data in its internal representation. It organizes the file based on records (a collection of fields containing data values). File – records- fields- values
Serial file organization
is a method used to organize record within a file. It does so by storing them in chronological order, meaning that files are stored in the order that they were added, with new records being appended to the end of the file.
Sequential file organization
stores records in an order that is usually based on the record’s unique key field, as the key field is essentially the record’s unique identifier. E.g. for a file containing data on the passwords of users.
Direct/Random-access method
Randomly ordered, where the record locations are determined by the product of a computation carried by the hashing algorithm.
Normalization
is the reduction in wasted bits by representing a number with the highest magnitude (there are no wasted leading zero bits). A normalized representation would mean that the two most significant bits are different. This reduction in wasted bits would mean greater precision and the elimination of ambiguity, especially regarding the sign of the number.
Rounding errors
difference between the result produced by an algo and the exact mathematical value. This can be attributed to the limited accuracy previously mentioned. This error becomes quite significant if calculations are repeated enough times. The only way to preventing this, is to increase the bits for the mantissa.
Overflow
occurs when the largest number that a register can hold is exceeded. This can often occur when an already large number in the mantissa is multiplied by another large number.
Underflow
occurs when the smallest number a register can represent is surpassed. This can often happen when an already small number is divided by a large number.
- Low-Level programming paradigm:
is programming instructions that use computer’s basic instruction set, such as assembly language.
- Imperative programming paradigm
involves writing a program as a sequence of explicit steps that are executed by the processor.
- Object-Oriented programming paradigm
object-oriented paradigm is based on objects interacting with one another. These objects are data structures with associated methods
- Declarative programming paradigm
where a program describes what the desired result should be
Record
a collection of fields containing data values. These values can be predefined, and, therefore, a record is a composite data type.
Circuit switching
is a method of transmission in which data travels along a dedicated communication path (circuit/channel). This channel lasts throughout the duration of the communication, where unimpeded (unobstructed) transmission takes place.
Packet switching
is a method of transmission where no circuit is established. Rather, the data is broken down into equal sized packets, consisting of a header (which includes instructions for delivery) and the data body itself, of which is sent down independent routes. The routes chosen for each packet will be the most optimum one. This will largely depend on the traffic on each route.
- Connectionless service (packet switching)
where packets are dispatched with no knowledge of whether the receiver is ready to accept the packet and has no way of finding out whether a transmission has succeeded.
- Connection-oriented service (packet switching)
where the first packet is sent, including a request for an acknowledgement. If the acknowledgment is received, the sender transmits the rest of the packets; otherwise, the sender tries sending the first packet again.
The routing table
is a data table stored in a router that lists the routes to a particular network destination and the metrics associated with those routes. This information is necessary to forward a packet along the best path toward its destination.
Protocol
a set of rules, of which are agreed by both sender and recipient
Protocol suites
contain more than one individual protocol. This complexity of networking requires the use of more than one protocol.
- HTTP
hypertext transfer protocol: responsible for correct transfer of files that make up web pages on the WWW
- SMTP
simple mail transfer protocol: handles the sending of emails
- POP3/4
post office protocol: handles the receiving of emails
- DNS
Domain name service: protocol used to find the IP address
- FTP
file transfer protocol: used when transferring messages and attachments
TCP
stands for transmission control protocol. It receives the data from the application layer and splits them into equally sized packets. It also adds a header onto each packet, where if the packet is one of a sequence, a sequence number is included to ensure eventual correct reassembly of user data. Error checking information, for the receiver to use, is also added to the header. The packet is then transmitted to the network/internet layer.
Handshake
is a term used to describe the process of one computer establishing a connection with another computer or device. These steps included verifying the connection and authorization of a computer connection.
Ethernet
is a system that connects several computers together to form a LAN. It is a local protocol and doesn’t provide means to communicate with external devices. The Ethernet system uses protocols to control the movement of frames between devices and to avoid simultaneous transmission by two or more devices.
Wireless (WiFi) protocols
are wireless LANs that use a protocol called CSMA/CA (carrier sense multiple access with collision avoidance). This is not to be confused with CSMA/CD (carrier sense multiple access with collision detection). CSMA/CA is used to avoid collisions whilst CSMA/CD is used to resolve/handle collisions.
CSMA/CA
ensures that devices only transmit where there is a free channel available. The protocol uses a function to check on whether a channel is free, if the channel isn’t free, then it will hold off anything further transmission until the channel is free, and only once its free can the transmission can take place.
BitTorrent
is a protocol used in peer-to-peer network/file sharing community. The main difference between the normal peer-to-peer file sharing protocol and BitTorrent protocol is that BitTorrent allows for MANY computers (acting as peers) to share files. Note that in peer-to-peer (including BitTorrent) files are shared directly with one another, not through the use of servers.
Torrent
is the file being shared on the peer-to-peer network. This does not contain all the data you are looking for though; it just tells your computer what you’re looking for
Tracker
is the central server that stores details about other computers that make up the swarm, such as the IP addresses of the peers, and what peer has what piece of the data.
CISC
(Complex Instruction Set Computer) is a processor architecture/design with the main goal of carrying out a given task with as few lines of assembly code as possible. This ultimately makes the code comparatively shorter, and therefore, reduces the memory (RAM) requirements.
RISC
(Reduced Instruction Set Computer) is a processor architecture/design with the main goal of using less complex/ more optimized sets of instructions. This is done by breaking up the assembly code instructions into several, simpler, single-cycle instructions. This ultimately leads to faster processor performance (processing time) of instructions, as less work must be done by the processor regarding breaking down the commands.
Pipelining
is a form of computer organization that allows several instructions to be processed simultaneously without having to wait for previous instructions to finish.
Interrupts
are signals emitted by hardware or software when a process or an event needs immediate attention. It alerts the processor to a high-priority process requiring interruption of the current working process.
Parallel Processing
is an operation which allows a process to be split up and for each part to be executed by a different processor at the same time. Essentially, it is the simultaneous processing of data. There are many ways for parallel processing to be carried out
Massively parallel computers
on the other hand, forms a genuine- parallel system. It is the linking of several computers effectively forming one machine with thousands of processors.
bottleneck
is a situation wherein too many demands on a physical CPU degrade system performance or make the system run slower.
Core
is essentially a processing unit within the Core Processing Unit. Today, CPUs are made up of numerous processing units. This allows for more efficient and faster processing as instructions can be passed down to varying cores, allowing for numerous instructions to be processed simultaneously.
- Combination circuits
(where the outputs solely depend on the input values). All logic gate circuits we have looked at so far has been combination circuits
- Sequential circuits
(where the output depends on the input value produced from a previous output value). Examples of sequential circuits include flip-flops. We will be considering the SR flip-flop and JK flip-flop.
Interpreters
translate code into machine code, line/instruction by line/instruction. The CPU executes each instruction before the interpreter moves on to translate the next instruction.
Compilers
Compilers translate high-level language into machine code, outputting that machine/object code.It is important to note though that a compiler does not execute the code. The output is strictly the object code and/or all the error messages. These error messages would be reported at the end of the compilation.
Syntax Analysis
is the process of double-checking the code for grammar mistakes (syntax errors). The syntax of some code is often set out in BNF notation. If any errors were found, they will be outputted. The code generation process will not begin until the errors are resolved,
Code generation
is the process by which the object program or, at times, intermediate code is generated.
Lexical analysis
is the process of converting a sequence of characters to a sequence of tokens (sequence of characters/strings).
Optimization
is the final stage in the translation process and is where the code is edited to make improvements in efficiency. This allows for greater efficiency in time and memory allocation.
Syntax diagram
is a graphical method of defining and showing the grammatical rules of a programming language.
Backus-Naur form (BNF)
is a formal method for describing the syntax of programming language
Eavesdropper
is a person who intercepts data being transmitted
Encryption
is the process of converting the original representation of data known as plaintext into an alternate form known as ciphertext. Only the intended receipt can decipher ciphertext back to plaintext and therefore have access the original information.
Symmetric encryption
is an encryption type in which the same key is used to both encrypt and decrypt messages. This key is called the secret key.
Block cipher (symmetric encryption)
encrypts data in fix-sized blocks/groups (i.e., 128 bits) in one go rather than one bit at a time (what would otherwise be called stream cipher). Here, the plaintext is first passed down into a block cipher encryption, where it would use the secret key to encrypt the data. The product is ciphertext. The first block of ciphertext is then XOR-ed with the next block of plaintext to be encrypted, thus preventing identical plaintext blocks producing identical ciphertext. This process is known as block chaining.
Stream cipher
is the encryption of bits in sequence as they arrive at the encryption algorithm. Although it also XOR-es the plaintext, you don’t need to know that and its specifics.
Key distribution problem
is a security problem inherent in symmetric encryption due to the fact that there is a risk that the key can be intercepted by an eavesdropper/hacker when getting sent to the recipient
asymmetric encryption
a form of encryption that uses two keys- a public key and a private key.
Public key
is an encryption/decryption key known to all users (one way fucntion)
Private key
is an encryption/decryption key which is known only to a single user/computer (one way function)
Quantum Key Distribution (QKD)
is a protocol which uses quantum mechanics to securely send encryption keys over fiber optic cables.
Cryptography
is the practice and study of techniques for secure communication
Secure Sockets Layer
is a security protocol used when sending data over the internet.
Transport Layer Security
is also a security protocol used when sending data over the internet; however, it is a more up to date version of the SSL.
Public key infrastructure (PKI)
is a term used to refer to everything used to establish and manage public key encryption and identification. It is, therefore, a set of protocols, standards, and services that allow users to authenticate each other using digital certificates issued by a CA.
- Handshake protocol (layer of the TLS)
permits the web server and client to authenticate one another and make use of encryption algorithms (it establishes a secure session between client and server)
- Record protocols (layer of the TLS)
contains the data being transmitted over the network (this can be used with or without encryption)
Digital signature
is an electronic way of validating the authenticity of digital documents [that is, making sure they have not been tampered with during transmission] and proof that a document was sent by a known user.
hashing algorithm
a one-way function which converts data strings into a numeric string which is used in cryptography. This numeric string is called a digest.
Digest
is a fixed-size numeric representation of the contents of a message produced from a hashing algorithm. This can often be encrypted to form a digital signature.
Digital certificate
is an electronic document used to prove the identity of a website or individual. It contains a public key and information identifying the website owner or individual. The certificate is issued by a CA (Certificate Authority) who independently validate the identity of the certificate owner.
Artificial Intelligence
can be thought of as a machine with cognitive abilities such as problem solving and learning from examples. It is the ability of a computer to perform tasks commonly associated with human intelligence
- Narrow AI
is when a machine has superior performance to a human when doing one specific task (such as the Chess engine)
- General AI
is when a machine is similar in its performance to a human in any intellectual task
- Strong AI
is when a machine has superior performance to a human in many tasks
Machine learning
is a branch of AI and refers to algorithms that enable software to improve its performance overtime as the software obtains more data. This is programming by input output examples rather than just coding – basically systems that learn without being programmed to learn
Labeled data
is data that comes with a tag, such as a name, type, or a number. Basically input data that has already been supplied with the target answer/output of what it is.
Unlabeled data
is data that does not have a tag. It is the raw form of data. It is data where objects are undefined and need to be manually recognized.
Supervised learning
is a learning approach in which the computer algorithm is trained on labelled input data. The model is trained until it can detect the underlying patters and relationships between the input data and the output labels, enabling it to yield accurate output results when presented with, never seen before, unlabeled data.
Unsupervised learning
is a system which can identify hidden patterns, similarities, and anomalies from input data by making data more readable and organized through the use of density estimations and k-mean clustering (you do not need to know how those analytical methods work, just that machine learning makes use of it)
Reinforcement learning
is a learning method where agents are trained on a reward and punishment mechanism. The agent is rewarded for correct moves and punished for the wrong ones. In doing so, the agent tries to minimize wrong moves and maximize the right ones, in a particular environment (In other words, it uses trial and error in algorithms to determine which action gives the highest/ optimal outcome).
Semi-supervised (action) learning
is a learning method utilized in machine learning that makes use of both labelled and unlabeled data to train algorithms. Essentially, the labeled data establishes base labels and categories that are used as a starting point for the algorithm to process related unlabeled data. This approach is often necessary when it is considered too time-consuming and expensive to collect large amounts of labeled data.
Deep learning
is a type of machine learning that is inspired by the structure of the human brain, allowing them to handle huge amounts of data and think in a way similar to the human brain. The machine adopts a structure called the artificial neural networks.
Artificial neural networks
are networks of interconnected nodes (connection point in a communication network) based on the interconnections between neurons in the human brain. The system is able to think like a human thanks to the neural network, and its performance improves with more data.
Hidden layer
is located between the input and output of layers. The Hidden Layer identifies features from the input data and uses them to correlate between a given input and a correct output, i.e., in facial recognition or in the example previously given: if 35°C, not raining, and is a Friday then number of swimmers is 80% going to be 66.
Regression
is a statistical measure used to make predictions from data by finding relationships between the inputs and the outputs. It is basically a line of best fit that minimizes the disparities.