Version control systems Flashcards

1
Q

Version

A

Snapshot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Branch

A

(temporarily) independent development line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Merge

A

Integration of a branch into another development line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Merge conflict

A

A problem that occurs when attempting a merge with changes that cannot be integrated easily

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Development line

A

A logically sequential line of version

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Tag

A

Named version in the repository

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

SVN stands for

A

Subversion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does it mean that SVN is a client sever version control system?

A

SVN operates on a client-server model, which means that there are two main components: the client and the server.
Developers have a local working copy on their machine (but not a full repo with complete history).
The server is a centralised repo that stores the complete version history of the project.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Definition SVN

A

Subversion is a centralised version control system used for tracking changes in files and directories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

SVN repository

A

SVN uses a centralised repository to store the complete version history of a project.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

SVN branching and merging

A

Supports branching and merging, but can be more manual and less intuitive compared to git

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

True or false:
SVN allows to check out parts of the repository (directories/files)

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

SVN import

A

Initialise repository from working copy.
svn import C:\repository https://www.example.com/svn

From computer to repository.
Put a directory under version control by sending its data from the local computer to the remote repository. This is done only once for a directory to initialise the repository with the respective data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

SVN Checkout

A

Initialize working copy from repository.
svn checkout https://www.example.com/svn C:\repository

Initialise a local working copy with the data from the repository. This is only done once for a directory from the repository when copying to the local computer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

SVN commit

A

Send changes from working copy to repository.
svn commit -m “Completed printing feature.”

Send changes to the repository, which then makes an effort to integrate it into the current state of the repository (even if this may have changed since the last update).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

SVN update

A

Retrieve changes from repository to working copy.
svn update

Retrieve changed data from the repository where it is integrated into the current state of the working copy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

SVN add

A

The add command tells SVN to have the specified files under version control, i.e., when committing to the repository, it check if there were changes and, if so, stores them as new version.
Put new file(s) under version control.
svn add Example1.java

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Git definition

A

Git is a distributed version control system that allows multiple developers to work on a project individually.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Git repository

A

Each developer has a complete copy of the repository, including the full version history, making it a distributed system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Git branching and merging

A

Branching and merging are fundamental and easy to use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Git Workflow

A

Follows a distributed model, enabling developers to work offline and commit changes locally before pushing to a central repository.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Git init

A

Initialize local repository from working copy.
git init
Put a directory under version control by sending its data to the local repository.
This is done only once for a directory to initialise the repository with the respective data. The data is, at this point, not yet available in the remote repository and needs to be pushed.

22
Q

Git clone

A

Initialise local repository from remote repository.
git clone https://github.com/chseidl/sdse_students.git

Initialise a local repository and local working copy with the data from the remote repository. This is only done once for the remote repository when copying to the local computer.

22
Q

Git pull

A

Retrieve changes from remote repository to local repository and
working copy.
git pull

Send the data from the remote repository to the local repository where it is integrated into the current state of the working copy.

22
Q

Git commit

A

Send changes from working copy to local repository.
git commit -m “Completed printing feature.”

Send changed data to the local repository, which then makes an effort to integrate it into the current state of the repository (even if it may have changed since the last update). The data is, at this point, not yet available in the remote repository and needs to be pushed.

22
Q

Git push

A

Send changes from local to remote repository.
git push

Send the data from the local repository to the remote one.

22
Q

Git pull

A

Retrieve changes from the remote repository to local repository and working copy.
git pull

23
Q

Git add

A

Put new/changed file(s) under version control.
git add Example1.java

The add command tells Git to have the specified file(s) under version control, i.e., when committing to the local repository and pushing to the remote directory, it checks if there were changes and, if so, stores them as a new version.

23
Q

Create a branch in Git

A

git checkout -b BRANCHNAME

The checkout command switches to another branch of development. When used with -b, a new branch is created before immediately switching to it.

24
Q

Merge branch with master branch in Git

A

git checkout –b printing
// development
git commit –m “Realized new feature.”
git checkout master
git merge printing

The merge command merges the changes of the specified branch into the currently active branch, i.e. when wanting to merge into master (the default) one has to switch to it first.

24
Q

Stable release

A

The current version of the software as the majority of people would use it.

24
Q

Pre release (alpha/beta/cutting edge/nightly build/etc.)

A

Is the version of the software that contains the newest features, but have not been tested properly for a general release.

25
Q

Long term support version

A

For late adopters that cannot update frequently. Supposed to still receive critical updates over long periods of time, but no new functionality.

26
Q

Semantic versioning scheme for releases

A

Major.minor.patch

Major: significant new program functionality
Minor: new program functionality that is compatible with old functionality
Patch: bug fixes and minor internal changes.

27
Q

What are three typical challenges in big data / data science?

A
  1. Data
    Large, sparse, replicated data
  2. Coordination
    Communication, query data, similar setup
  3. Calculation
    Scaling, parallelism, distribution
28
Q

Hadoop

A

Solve massive data problems with distributed computers
- storage and processing of big data
- compensates for hardware failures

MapReduce
HDFS distributed file system, many cheap computers

29
Q

Spark

A

Distributed computing system.
Especially supports iterative and interactive/exploratory programming models as, e.g. needed by training algorithms for machine learning.
Apache Spark is designed for in-memory data processing, which makes it much faster than traditional data processing frameworks like Apache Hadoop.

30
Q

Apache HBase

A

Data storage tool.
Distributed, fault tolerant, column oriented non-relational database on top of HDFS.
Logo is shark.

31
Q

Apache Phoenix

A

Data storage tool
Distributed relational database engine with SQL support using Hbase.

32
Q

Apache druid

A

Data storage tool
Distributed column-oriented data store for real time analytics

33
Q

Apache Cassandra

A

Data storage tool
Distributed wide-column data store for big data

34
Q

Apache hive

A

Data storage tool
Data warehouse for simplified/unified data query and analysis

35
Q

Apache drill

A

Data storage tool
Standard SQL queries on Hadoop for big data

36
Q

Apache Zookeeper

A

Coordination tool
Centralised service for distributed access to a hierarchical key-value store.
Apache ZooKeeper is an open-source distributed coordination service designed to manage and synchronize the configuration information, naming, and various other distributed services across a large distributed system

37
Q

Apache Pig

A

Calculation tool
High-level platform for creating programs that run on Hadoop.
Pig’s intent is to make development of applications for hadoop easier.

38
Q

Apache Kafka

A

Calculation tool
Collect and distribute data streams in real-time from/to interested clients

39
Q

Apache samza

A

calculation tool
Develop applications that process streaming data, e.g. from Kafka

40
Q

Apache Mahout

A

ML tool
Collection of distributed, scalable machine learning algorithms
Distributed linear algebra framework

41
Q

ml4j

A

ML tool
ML library for Java

42
Q

DL4J

A

ML tool
Distributed deep learning library for Java

43
Q

WEKA

A

ML tool
Data mining through machine learning

44
Q

For one of the core libraries of your system, a colleague recommends using a “pre-release”
version. Argue for or against that idea with at least two benefits/drawbacks that this may entail.

A

Arguing for using a “pre-release” version:

Access to New Features and Improvements:

Benefit: Pre-release versions often include the latest features, enhancements, and bug fixes. By using a pre-release version, you can gain early access to these improvements, allowing you to take advantage of new functionality and optimizations.
Early Testing and Feedback:

Benefit: Adopting a pre-release version allows you to participate in early testing and provide feedback to the developers. This can be valuable for both you and the development team, as it helps identify and address issues before the stable release. Your input could contribute to a more robust and reliable final release.
Arguing against using a “pre-release” version:

Stability and Reliability Concerns:

Drawback: Pre-release versions are inherently less stable than their stable counterparts. They may contain bugs, incomplete features, or undergo significant changes that could impact the reliability of your system. Depending on your project’s requirements, relying on a pre-release version might introduce unnecessary risks.
Compatibility Issues:

Drawback: Pre-release versions may not be backward compatible with the stable releases or with other libraries/tools in your ecosystem. This could lead to integration challenges and increase the complexity of your development and deployment processes. Using a stable release ensures a more predictable and compatible environment.

45
Q

Contrast client-server version control systems and distributed version control systems using the
examples of SVN and Git. Specifically mention, for each type of system, which repositories exist,
where they are stored and how communication between them works. For describing the
communication, use the terms of the respective commands in SVN and Git.

A

Look at slides

46
Q
A