Version Control - CL2 Flashcards
Importing sources into repository
Initializing a Repository in an Existing Directory
If you’re starting to track an existing project in Git, you need to go to the project’s directory and type
$ git init
This creates a new subdirectory named .git that contains all of your necessary repository files — a Git repository skeleton. At this point, nothing in your project is tracked
yet.
If you want to start version-controlling existing files (as opposed to an empty directory), you should probably begin tracking those files and do an initial commit. You can accomplish that with a few git add commands that specify the files you want to track, followed by a commit:
$ git add *.c
$ git add README
$ git commit m ’initial project version’
We’ll go over what these commands do in just a minute. At this point, you have a Git repository with tracked files and an initial commit.
Exporting sources from repository
Exporting Your Repository
Git attribute data also allows you to do some interesting things when exporting an archive of your project.
export-ignore
You can tell Git not to export certain files or directories when generating an archive. If there is a subdirectory or file that you don’t want to include in your archive file
but that you do want checked into your project, you can determine those files via the export-ignore attribute.
For example, say you have some test files in a test/ subdirectory, and it doesn’t make sense to include them in the tarball export of your project. You can add the
following line to your Git attributes file:
test/ export-ignore
Now, when you run git archive to create a tarball of your project, that directory won’t be included in the archive.
export-subst
Another thing you can do for your archives is some simple keyword substitution. Git lets you put the string $Format:$ in any file with any of the –pretty=format formatting shortcodes. For instance, if you want to include a file named LAST COMMIT in your project, and the last commit date was automatically injected into it when git archive ran, you can set up the file like this:
echo ’Last commit date: $Format:%cd$’ > LAST_COMMIT
echo “LAST_COMMIT export-subst”»_space; .gitattributes
git add LAST_COMMIT .gitattributes
git commit -am ’adding LAST_COMMIT file for archives’
When you run git archive , the contents of that file when people open the archive file will look like this:
$ cat LAST_COMMIT
Last commit date: $Format:Tue Apr 21 08:38:48 2009 -0700$
Comparing repository revisions
Determining What Is Introduced
It’s often helpful to get a review of all the commits that are in this branch but that aren’t in your master branch. You can exclude commits in the master branch by adding the –not option before the branch name. For example, if your contributor sends you two patches and you create a branch called contrib and applied those patches there, you can run this:
$ git log contrib –not master
commit 5b6235bd297351589efc4d73316f0a68d484f118
Author: Scott Chacon
To see what changes each commit introduces, remember that you can pass the -p option to git log and it will append the diff introduced to each commit.
To see a full diff of what would happen if you were to merge this topic branch with another branch, you may have to use a weird trick to get the correct results. You may think to run this:
$ git diff master
This command gives you a diff, but it may be misleading. If your master branch has moved forward since you created the topic branch from it, then you’ll get seemingly strange results. This happens because Git directly compares the snapshots of the last commit of the topic branch you’re on and the snapshot of the last commit on the master branch. For example, if you’ve added a line in a file on the master branch, a direct comparison of the snapshots will look like the topic branch is going to remove that line.
If master is a direct ancestor of your topic branch, this isn’t a problem; but if the two histories have diverged, the diff will look like you’re adding all the new stuff in your topic branch and removing everything unique to the master branch.
What you really want to see are the changes added to the topic branch — the work you’ll introduce if you merge this branch with master. You do that by having Git compare the last commit on your topic branch with the first common ancestor it has with the master branch.
Technically, you can do that by explicitly figuring out the common ancestor and then running your diff on it:
$ git merge-base contrib master
36c7dba2c95e6bbb78dfa822519ecfec6e1ca649
$ git diff 36c7db
However, that isn’t convenient, so Git provides another shorthand for doing the same thing: the triple-dot syntax. In the context of the diff command, you can put three periods after another branch to do a diff between the last commit of the branch you’re on and its common ancestor with another branch:
$ git diff master…contrib
This command shows you only the work your current topic branch has introduced since its common ancestor with master. That is a very useful syntax to remember.
Creating a branch
What a Branch Is
A branch in Git is simply a lightweight movable pointer to one of these commits.
The default branch name in Git is master. As you initially make commits, you’re given a master branch that points to the last commit you made. Every time you commit, it
moves forward automatically.
What happens if you create a new branch? Well, doing so creates a new pointer for you to move around. Let’s say you create a new branch called testing. You do this with the git branch command:
$ git branch testing
How does Git know what branch you’re currently on? It keeps a special pointer called HEAD. Note that this is a lot different than the concept of HEAD in other VCSs
you may be used to, such as Subversion or CVS. In Git, this is a pointer to the local branch you’re currently on. In this case, you’re still on master.
To switch to an existing branch, you run the git checkout command. Let’s switch to the new testing branch:
$ git checkout testin
Because a branch in Git is in actuality a simple file that contains the 40 character SHA–1 checksum of the commit it points to, branches are cheap to create and destroy.
Creating a new branch is as quick and simple as writing 41 bytes to a file (40 characters and a newline).
This is in sharp contrast to the way most VCS tools branch, which involves copying all of the project’s files into a second directory. This can take several seconds or even minutes, depending on the size of the project, whereas in Git the process is always instantaneous. Also, because we’re recording the parents when we commit, finding a proper merge base for merging is automatically done for us and is generally very easy to do. These features help encourage developers to create and use branches often.
Working with branch
Branch Management The git branch command does more than just create and delete branches. If you run it with no arguments, you get a simple listing of your current branches: $ git branch iss53 * master testing Notice the * character that prefixes the master branch: it indicates the branch that you currently have checked out. This means that if you commit at this point, the master branch will be moved forward with your new work. To see the last commit on each branch, you can run git branch v : $ git branch -v iss53 93b412c fix javascript issue * master 7a98805 Merge branch ’iss53’ testing 782fd34 add scott to the author list in the readmes
Another useful option to figure out what state your branches are in is to filter this list to branches that you have or have not yet merged into the branch you’re currently on.
The useful –merged and –no-merged options have been available in Git since version 1.5.6 for this purpose. To see which branches are already merged into the branch you’re on, you can run git branch merged :
$ git branch –merged
iss53
* master
Because you already merged in iss53 earlier, you see it in your list. Branches on this list without the * in front of them are generally fine to delete with git branch -d ;
you’ve already incorporated their work into another branch, so you’re not going to lose anything.
To see all the branches that contain work you haven’t yet merged in, you can run git branch –no-merged
This shows your other branch. Because it contains work that isn’t merged in yet, trying to delete it with git branch -d will fail:
$ git branch -d testing
error: The branch ’testing’ is not an ancestor of your current HEAD.
If you are sure you want to delete it, run git branch -D testing. If you really do want to delete the branch and lose that work, you can force it with -D , as the helpful
message points out.
Traversing branches
Topic Branches
Topic branches, however, are useful in projects of any size. A topic branch is a short-lived branch that you create and use for a single particular feature or related work. This is something you’ve likely never done with a VCS before because it’s generally too expensive to create and merge branches. But in Git it’s common to create, work on, merge, and delete branches several times a day.
Consider an example of doing some work (on master ), branching off for an issue ( iss91 ), working on it for a bit, branching off the second branch to try another way
of handling the same thing ( iss91v2 ), going back to your master branch and working there for a while, and then branching off there to do some work that you’re not sure is a good idea ( dumbidea branch).
Now, let’s say you decide you like the second solution to your issue best ( iss91v2 );
and you showed the dumbidea branch to your coworkers, and it turns out to be genius.
You can throw away the original iss91 branch (losing commits C5 and C6) and merge in the other two.
It’s important to remember when you’re doing all this that these branches are completely local. When you’re branching and merging, everything is being done only in
your Git repository — no server communication is happening.
Common branching patterns (rebasing in Git)
Rebasing
In Git, there are two main ways to integrate changes from one branch into another: the merge and the rebase.
The Basic Rebase
If you go back to an earlier example from the Merge section (see Figure 3.27), you can see that you diverged your work and made commits on two different branches.
The easiest way to integrate the branches, as we’ve already covered, is the merge command.
However, there is another way: you can take the patch of the change that was introduced in C3 and reapply it on top of C4. In Git, this is called rebasing. With the rebase command, you can take all the changes that were committed on one branch and replay them on another one.
In this example, you’d run the following:
$ git checkout experiment
$ git rebase master
First, rewinding head to replay your work on top of it…
Applying: added staged command
It works by going to the common ancestor of the two branches (the one you’re on and the one you’re rebasing onto), getting the diff introduced by each commit of the branch you’re on, saving those diffs to temporary files, resetting the current branch to the same commit as the branch you are rebasing onto, and finally applying each change in turn.
Often, you’ll do this to make sure your commits apply cleanly on a remote branch — perhaps in a project to which you’re trying to contribute but that you don’t main-
tain. In this case, you’d do your work in a branch and then rebase your work onto origin/master when you were ready to submit your patches to the main project. That way, the maintainer doesn’t have to do any integration work — just a fast-forward or a clean apply.
Note that the snapshot pointed to by the final commit you end up with, whether it’s the last of the rebased commits for a rebase or the final merge commit after a merge, is the same snapshot — it’s only the history that is different. Rebasing replays changes from one line of work onto another in the order they were introduced, whereas merging takes the endpoints and merges them together.
Merging branches
Basic Merging Suppose you’ve decided that your issue #53 work is complete and ready to be merged into your master branch. In order to do that, you’ll merge in your iss53 branch, much like you merged in your hotfix branch earlier. All you have to do is check out the branch you wish to merge into and then run the git merge command: $ git checkout master $ git merge iss53 Merge made by recursive. README | 1 + 1 files changed, 1 insertions(+), 0 deletions(-)
This looks a bit different than the hotfix merge you did earlier. In this case, your development history has diverged from some older point. Because the commit on the branch you’re on isn’t a direct ancestor of the branch you’re merging in, Git has to do some work. In this case, Git does a simple three-way merge, using the two snap-
shots pointed to by the branch tips and the common ancestor of the two. Figure 3.16 highlights the three snapshots that Git uses to do its merge in this case.
Tagging (labeling in TFS)
Tagging Your Releases
When you’ve decided to cut a release, you’ll probably want to drop a tag so you can re-create that release at any point going forward. If you decide to sign the tag as the maintainer, the tagging may look something like this:
$ git tag -s v1.5 -m ’my signed 1.5 tag’
You need a passphrase to unlock the secret key for
user: “Scott Chacon “
1024-bit DSA key, ID F721C45A, created 2009-02-09
If you do sign your tags, you may have the problem of distributing the public PGP key used to sign your tags. The maintainer of the Git project has solved this issue by
including their public key as a blob in the repository and then adding a tag that points directly to that content. To do this, you can figure out which key you want by running
gpg –list-keys :
$ gpg –list-keys
/Users/schacon/.gnupg/pubring.gpg
Then, you can directly import the key into the Git database by exporting it and piping that through git hash-object , which writes a new blob with those contents into
Git and gives you back the SHA–1 of the blob:
$ gpg -a –export F721C45A | git hash-object -w –stdin
659ef797d181633c87ec71ac3f9ba29fe5775b92
Now that you have the contents of your key in Git, you can create a tag that points directly to it by specifying the new SHA–1 value that the hash-object command gave you:
$ git tag -a maintainer-pgp-pub 659ef797d181633c87ec71ac3f9ba29fe5775b92
If you run git push –tags , the maintainer-pgp-pub tag will be shared with everyone. If anyone wants to verify a tag, they can directly import your PGP key by pulling the blob directly out of the database and importing it into GPG:
$ git show maintainer-pgp-pub | gpg –import
They can use that key to verify all your signed tags. Also, if you include instructions in the tag message, running git show will let you give the end user more specific
instructions about tag verification.
Creating and applying patches (shelvesets in TFS)
Staging Patches
It’s also possible for Git to stage certain parts of files and not the rest. For example, if you make two changes to your simplegit.rb file and want to stage one of them and not the other, doing so is very easy in Git. From the interactive prompt, type 5 or p (for patch). Git will ask you which files you would like to partially stage; then, for each section of the selected files, it will display hunks of the file diff and ask if you would like to stage them, one by one
The status of the simplegit.rb file is interesting. It shows you that a couple of lines are staged and a couple are unstaged. You’ve partially staged this file. At this point,
you can exit the interactive adding script and run git commit to commit the partially staged files.
Finally, you don’t need to be in interactive add mode to do the partial-file staging — you can start the same script by using git add -p or git add –patch on the command
line.
Revision specifiers
http://schacon.github.io/git/git-rev-parse.html#_specifying_revisions
Data recovery
Data Recovery
At some point in your Git journey, you may accidentally lose a commit. Generally, this happens because you force-delete a branch that had work on it, and it turns out you wanted the branch after all; or you hard-reset a branch, thus abandoning commits that you wanted something from. Assuming this happens, how can you get your commits back?
Here’s an example that hard-resets the master branch in your test repository to an older commit and then recovers the lost commits. First, let’s review where your repository is at this point
Now, move the master branch back to the middle commit:
$ git reset –hard 1a410efbd13591db07496601ebc7a059dd55cfe9
You’ve effectively lost the top two commits — you have no branch from which those commits are reachable. You need to find the latest commit SHA and then add a branch that points to it. The trick is finding that latest commit SHA — it’s not like you’ve memorized it, right?
Often, the quickest way is to use a tool called git reflog. As you’re working,
Git silently records what your HEAD is every time you change it. Each time you commit or change branches, the reflog is updated. The reflog is also updated by the git
update-ref command, which is another reason to use it instead of just writing the SHA value to your ref files, as we covered in the “Git References” section of this chapter
earlier. You can see where you’ve been at any time by running git reflog :
$ git reflog
Using external editors and merge tools
External Merge and Diff Tools
Although Git has an internal implementation of diff, which is what you’ve been using,
you can set up an external tool instead. You can also set up a graphical merge conflic-
tresolution tool instead of having to resolve conflicts manually. I’ll demonstrate setting
up the Perforce Visual Merge Tool (P4Merge) to do your diffs and merge resolutions,
because it’s a nice graphical tool and it’s free.
If you want to try this out, P4Merge works on all major platforms, so you should
be able to do so. I’ll use path names in the examples that work on Mac and Linux
systems; for Windows, you’ll have to change /usr/local/bin to an executable path in
your environment.
You can download P4Merge here:
http://www.perforce.com/perforce/downloads/component.html
To begin, you’ll set up external wrapper scripts to run your commands. I’ll use the
Mac path for the executable; in other systems, it will be where your p4merge binary is
installed. Set up a merge wrapper script named extMerge that calls your binary with all
the arguments provided:
$ cat /usr/local/bin/extMerge
#!/bin/sh
/Applications/p4merge.app/Contents/MacOS/p4merge $*
The diff wrapper checks to make sure seven arguments are provided and passes two
of them to your merge script. By default, Git passes the following arguments to the diff
program:
path old-file old-hex old-mode new-file new-hex new-mode
Because you only want the old-file and new-file arguments, you use the wrapper
script to pass the ones you need.
$ cat /usr/local/bin/extDiff
#!/bin/sh
[ $# -eq 7 ] && /usr/local/bin/extMerge “$2” “$5”
You also need to make sure these tools are executable:
$ sudo chmod +x /usr/local/bin/extMerge
$ sudo chmod +x /usr/local/bin/extDiff
Now you can set up your config file to use your custom merge resolution and diff
tools. This takes a number of custom settings: merge.tool to tell Git what strategy to
use, mergetool.*.cmd to specify how to run the command, mergetool.trustExitCode
to tell Git if the exit code of that program indicates a successful merge resolution or
not, and diff.external to tell Git what command to run for diffs. So, you can either
run four config commands
$ git config –global
$ git config –global
’extMerge “$BASE”
$ git config –global
$ git config –global
merge.tool extMerge
mergetool.extMerge.cmd \
“$LOCAL” “$REMOTE” “$MERGED”’
mergetool.trustExitCode false
diff.external extDiff
Git comes preset to use a number of other merge-resolution tools without your hav-
ing to set up the cmd configuration. You can set your merge tool to kdiff3, opendiff,
tkdiff, meld, xxdiff, emerge, vimdiff, or gvimdiff. If you’re not interested in using KD-
iff3 for diff but rather want to use it just for merge resolution, and the kdiff3 command
is in your path, then you can run
$ git config –global merge.tool kdiff3
If you run this instead of setting up the extMerge and extDiff files, Git will use
KDiff3 for merge resolution and the normal Git diff tool for diffs.
Distributed VCS (if supported)
- Workflows
- Checking out remote branches
- Integrating contributed work
Distributed Workflows
Unlike Centralized Version Control Systems (CVCSs), the distributed nature of Git allows you to be far more flexible in how developers collaborate on projects. In centralized systems, every developer is a node working more or less equally on a central hub. In Git, however, every developer is potentially both a node and a hub — that is, every developer can both contribute code to other repositories and maintain a public repository on which others can base their work and which they can contribute to. This opens a vast range of workflow possibilities for your project and/or your team, so I’ll cover a few common paradigms that take advantage of this flexibility. I’ll go over the strengths and possible weaknesses of each design; you can choose a single one to use, or you can mix and match features from each.
Integration-Manager Workflow
Because Git allows you to have multiple remote repositories, it’s possible to have a workflow where each developer has write access to their own public repository and read access to everyone else’s. This scenario often includes a canonical repository that represents the “official” project. To contribute to that project, you create your own public clone of the project and push your changes to it. Then, you can send a request to the maintainer of the main project to pull in your changes. They can add your repository as a remote, test your changes locally, merge them into their branch, and push back to their repository. The process works as follow (see Figure 5.2):
1. The project maintainer pushes to their public repository.
2. A contributor clones that repository and makes changes.
3. The contributor pushes to their own public copy.
4. The contributor sends the maintainer an e-mail asking them to pull changes.
5. The maintainer adds the contributor’s repo as a remote and merges locally.
6. The maintainer pushes merged changes to the main repository.
This is a very common workflow with sites like GitHub, where it’s easy to fork a project and push your changes into your fork for everyone to see. One of the main advantages of this approach is that you can continue to work, and the maintainer of the main repository can pull in your changes at any time. Contributors don’t have to wait for the project to incorporate their changes — each party can work at their own pace.
Dictator and Lieutenants Workflow
This is a variant of a multiple-repository workflow. It’s generally used by huge projects with hundreds of collaborators; one famous example is the Linux kernel. Various integration managers are in charge of certain parts of the repository; they’re called lieutenants. All the lieutenants have one integration manager known as the benevolent dictator. The benevolent dictator’s repository serves as the reference repository from which all the collaborators need to pull. The process works like this (see Figure 5.3):
1. Regular developers work on their topic branch and rebase their work on top of master. The master branch is that of the dictator.
2. Lieutenants merge the developers’ topic branches into their master branch.
3. The dictator merges the lieutenants’ master branches into the dictator’s master branch.
4. The dictator pushes their master to the reference repository so the other developers can rebase on it.
Maintaining a Project
In addition to knowing how to effectively contribute to a project, you’ll likely need to know how to maintain one. This can consist of accepting and applying patches generated via format-patch and e-mailed to you, or integrating changes in remote branches for repositories you’ve added as remotes to your project. Whether you maintain a canonical repository or want to help by verifying or approving patches, you need to know how to accept work in a way that is clearest for other contributors and sustainable by you over the long run.
Working in Topic Branches
When you’re thinking of integrating new work, it’s generally a good idea to try it out in a topic branch — a temporary branch specifically made to try out that new work. This
way, it’s easy to tweak a patch individually and leave it if it’s not working until you have time to come back to it. If you create a simple branch name based on the theme of
the work you’re going to try, such as ruby client or something similarly descriptive, you can easily remember it if you have to abandon it for a while and come back later.
The maintainer of the Git project tends to namespace these branches as well — such as sc/ruby client , where sc is short for the person who contributed the work. As you’ll
remember, you can create the branch based off your master branch like this:
$ git branch sc/ruby_client master
Or, if you want to also switch to it immediately, you can use the checkout -b option:
$ git checkout -b sc/ruby_client master
Now you’re ready to add your contributed work into this topic branch and determine if you want to merge it into your longer-term branches.
Applying Patches from E-mail
If you receive a patch over e-mail that you need to integrate into your project, you need to apply the patch in your topic branch to evaluate it. There are two ways to apply an
e-mailed patch: with git apply or with git am .
Applying a Patch with apply
If you received the patch from someone who generated it with the git diff or a Unix diff command, you can apply it with the git apply command. Assuming you saved the patch at /tmp/patch-ruby-client.patch , you can apply the patch like this:
$ git apply /tmp/patch-ruby-client.patch
This modifies the files in your working directory. It’s almost identical to running a
patch -p1 command to apply the patch, although it’s more paranoid and accepts fewer fuzzy matches then patch. It also handles file adds, deletes, and renames if they’re
described in the git diff format, which patch won’t do. Finally, git apply is an “apply all or abort all” model where either everything is applied or nothing is, whereas patch can partially apply patchfiles, leaving your working directory in a weird state.
Checking Out Remote Branches
If your contribution came from a Git user who set up their own repository, pushed a number of changes into it, and then sent you the URL to the repository and the name
of the remote branch the changes are in, you can add them as a remote and do merges locally.
For instance, if Jessica sends you an e-mail saying that she has a great new feature in the ruby-client branch of her repository, you can test it by adding the remote and
checking out that branch locally:
$ git remote add jessica git://github.com/jessica/myproject.git
$ git fetch jessica
$ git checkout -b rubyclient jessica/ruby-client
If she e-mails you again later with another branch containing another great feature, you can fetch and check out because you already have the remote setup.
This is most useful if you’re working with a person consistently. If someone only has a single patch to contribute once in a while, then accepting it over e-mail may be less time consuming than requiring everyone to run their own server and having to continually add and remove remotes to get a few patches. You’re also unlikely to want to have hundreds of remotes, each for someone who contributes only a patch or two.
However, scripts and hosted services may make this easier — it depends largely on how you develop and how your contributors develop.
The other advantage of this approach is that you get the history of the commits as well. Although you may have legitimate merge issues, you know where in your history
their work is based; a proper three-way merge is the default rather than having to supply a -3 and hope the patch was generated off a public commit to which you have access.
If you aren’t working with a person consistently but still want to pull from them in this way, you can provide the URL of the remote repository to the git pull command.
This does a one-time pull and doesn’t save the URL as a remote reference:
$ git pull git://github.com/onetimeguy/project.git
From git://github.com/onetimeguy/project
* branch
HEAD
-> FETCH_HEAD
Merge made by recursive.