Git Basics

Terminal
Published

May 13, 2012

Modified

July 18, 2023

Version Control System (VCS) (aka source control or revision control):

Why?

Why Use Version Control?

  • Prevent deletion, accidentally lose of files
  • Capability to revert changes in files
  • Enables to review the history of files
  • Allows in-deeps comparison between different versions of files

A repository (a database of changes) is the data structure that stores files with history and metadata.

Version control systems differ in where the repository lives:

  • Distributed (i.e. Git, Mercurial)
    • File history in a hidden repository folder inside the working copy
    • Checkouts, commits interact with the local repository folder
    • Different copies of the repository synchronized by the version control software
    • Typically repositories distributed with multiple public/private repositories
  • Centralized (i.e. CVS, Subversion)
    • Dedicated central server, stores files’ history and controls access
    • Separate local working copy from the “master copy” on the server
    • Working copy only stores the current versions (history in the server repository)
    • Checkouts, commits require connection to the server

Why Using Git?

  • Free and open source distributed version control system (no central server)
  • Fast since all operations performed locally
  • Implicit backup since multiple copies are stored in distributed locations
  • All data is store cryptographicaly secured (temper proof)

Public Repository

Non-profit…

Commercial… (most popular)

  • github.com
    • …by GitHub, Inc …subsidiary of Microsoft (since 2018)
    • …free of charge …purchasable optional additional features, services
    • terms of service
  • gitlab.com
    • …by GitLab Inc.
    • …free of charge …purchasable optional additional features, services
    • terms of use

Comparison of source-code-hosting facilities, Wikipedia

Commits

Commit to add the latest changes to the repository:

  • A commit include…
    • ID of the previous commit(s)
    • Content, commit date & messages
    • Author and committer name and email address
  • The commit ID (SHA-1 hash) cryptographically certifies the integrity of the entire history of the repository up to that commit
  • Commits are immutable (can not be modified) afterwards (except HEAD)
  • Child commits point to 1..N parent commits (typically 1 or 2)
  • HEAD revision is the active commit, parent of the next commit

Files states include the following three:

  • Modified - Changed file(s) in working copy, not committed to repository
  • Staged - Marked modified file(s), current version to be committed to repository
  • Committed - Data is safely stored in your local repository

File states belong to one of the following three storage positions:

  • The working copy (checkout) contains editable files (a copy of the repository data)
  • The staging area (index) holds all marked changes ready to commit
  • The git repository .git/ stores all files, meta-data

Construct Commits

The basic workflow:

  1. Modify a file in the working copy, check with git status
  2. Accept a change to the staging are by adding a file with git add
  3. Perform a git commit that permanently stores files in staging to the repository
git status                               # show files changed in the working tree and index
git add <file>                           # add/update file from the working tree into
                                         # the index
git rm <file>                            # delete file from the index and working tree
git mv <sfile> <dfile>                   # move a file in working tree, plut appropriate
                                         # addtions/removals in the index
git commit                               # make a commit out of the current index
git add .                                # add all current changes in working tree into the index
git commit -m '<message>'                # commit files in staging area
git commit -am '<message>'               # commit all local changes
git commit --amend                       # change last commit

# Set the committer name & email for a single commit
GIT_COMMITTER_NAME='<name>' GIT_COMMITTER_EMAIL='<mail>' git commit --author 'name <mail>'

git clean -f                             # recursivly remove files not in version control

References

A reference ref is a (named mutable) pointer to an object (usually a commit)

  • Git knows different types of references:
    • heads refers to an object locally
    • remotes refers to an object which exists in a remote repository
    • stash refers to an object not yet committed
    • tags reference another object
  • Stored as Directed Acyclic Graph (DAG) of objects

Referring to objects:

  • Use its full SHA-1 commit ID e.g. 66f67970e73b5ad213d9bc69f7e6497b6bfc1b75
  • Truncated commit id s long as it is unambiguous e.g. 66f6797
  • You can refer to a branch or tag by name
  • Append a ^ to get the (first) parent, ^2 second parent, etc.
  • Append :<path> for a file or directory inside commit’s tree
  • Cf. git help rev-parse

show & diff

git show lists files that were changed in the merge commit

git diff shows difference of the merge commit’s first parent and the merge commit

git diff                                 # show difference between working tree and index
git diff --cached                        # show difference between HEAD and index (staged changes)
git diff $commit                         # show difference between commit and the working tree

log & grep

git log lists commits

  • search & filter commit history
  • …output is highly customizable in content and representation
git log -1 ...                            # show last commit
git log -p                                # with changes inline
git log --decorate --oneline --graph      # prettier graph-like structure
git log --stat                            # list changed files

Search on the current branch…

# search string in commit messages
git log --grep=$string

# search additions/deletions in commits
git log -S $string

# search complete commit for string and show filename and line-number
git grep -n $string $(git rev-list --all)

Repository

A repository includes four kinds of objects:

  • A blob (binary large object) is the content of a file
  • A tree object is the equivalent of a directory (cf. Merkle tree)
  • A commit object links tree objects together into a history
  • A tag object is a container that contains a reference to another object

Every working copy has its own Git repository in the .git subdirectory:

  • With arbitrarily many branches and tags
  • Most important ref is HEAD (which refers to the current branch)
  • Stores the index (staging area) for changes on top of HEAD that will become part of the next commit
  • Files outside of .git are called the working tree
.git/                                    # git repository directory
.git/config                              # configuration of the repository
.gitignore                               # files to ignore in the working tree
# list ignored files
git status --ignored
git clean -Xn                            # display a list of ignored files
git clean -Xf                            # remove the previously displayed files
git check-ignore -v <file>               # check if file is ignored

init

Create, init (initialize) a new repository in .git/:

  • Create an empty repository in the current working directory
  • By default it will have one master branch
# initialize a new repository
git init                                 

Repositories used for clone, push and pull usually are a bare repository:

  • A bare repositories (by definition) has no working tree attached
  • It’s conventional to give bare repositories the extension (suffix) .git (instead of project/.git)
  • Update a bare repository by pushing to it (using git push) from another repository
# intialize a new repository without working tree
git init --bare /path/to/project.git

clone & remote

Git allows bidirectional synchronisation between any number of repositories:

  • A Git repository can be configured with references to any number of remotes
  • Supports many protocols: SSH, HTTPS, DAV, Git protocol, Rsync, and a path to a local repository
  • Allows centralized and/or distributed development models

Copy, clone a repository from another location:

# clone a remote repository and create a working copy, optionally provide the target directory
git clone <url> [<path>] 

Following syntax references remote URLs and local paths:

# remote
ssh://[user@]host.xz[:port]/path/to/repo.git/
git://host.xz[:port]/path/to/repo.git/
http[s]://host.xz[:port]/path/to/repo.git/
[user@]host.xz:/~[user]/path/to/repo.git/
# local
file:///path/to/repo.git/
/path/to/repo.git/

Remote repositories are configured in .git/config (cf. git help git-config):

  • Freshly cloned repository have…
    • One reference to the origin remote repository (default source to pull/push)
    • Automatically create a master branch that tracks origin/master
  • Checkout of a local branch from a remote branch automatically creates a tracking branch
# clone a remote repository and checkout a specific branch
git clone -b <branch> <url> [<path>]                

Modify references to other repositories:

git remote -v                               # list references to remote repos (including URLs)
git remote add <remote_ame> <remote_url>    # add a reference to a remote repository
git remote show <remote_name>               # inspect a remote repository
git remote rename <old_name> <new_name>     # rename a reference to a remote repository
git remote rm <remote_name>                 # delete a reference to a remote repository

Multiple Repositories

git-repos helps to solve the following three use-cases:

  1. Maintains a list of Git remote repositories associated to a local directory tree.
  2. Indicate the local status for a list of repositories.
  3. Indicate the state of remotes for a list of local repositories.
>>> git repos status -v       
Reading configuration from ~/.gitrepos
Git in ~/projects/dummy
?? path/to/new/file
Git in ~/projects/scripts
↑1 backup/master
↑3 github/master
 M git-repos
Git in ~/projects/site
AM posts/git_repos.markdown
  • init creates missing directories defined in the repository list
  • status runs git status -s
    • …on all repositories prints the output
    • …checks if the local repositories are ahead of their remotes with git rev-list

Repository list in $PWD/.gitrepos, ~/.gitrepos or option --config PATH …with following format…

/path/to/the/repository
  origin git://host.org/project.git
  backup ssh://user@host.org/project.git
/path/to/another/repository
  name ~/existing/repo
~/path/to/yet/another/repo
  foobar ssh://user@host.org/foobar.git
realitve/path/to/repository
  deploy git://fqdn.com/name.git
  • …each directory is followed by a list of remotes using the notation of git remote add
  • …first the name of the remote, second the URI to the remote repository.

Synchronize Repositories

Git repositories are not automatically synchronised…

  • …user need to use push/pull to synchronised with remotes
  • stash helps to provide a clean working directory before pull

pull & push Sub-Commands

  • pull [<remote_name> <[branch_name>] copies changes from a remote repository…
    • …to the local repository…short for fetch and merge
    • pull --all update all local branches from their corresponding remote branches
    • pull --no-ff disable fast-forward
    • pull --rebase short for fetch and rebase
  • push [<remote_name>] [<branch_name>] copies local changes to a remote repository
    • push -u track remote with current branch

stash Sub-Command

Local changes will not be overwritten by git pull

  • stash stores a snapshot of your changes without committing
  • Separated from the working directory, the staging area, or the repository

Basic workflow example:

git stash                                # stash the changes in working tree
git pull                                 # pull commits from remove
git stash pop                            # apply changes on the current working tree
  • stash save [<message>] saves changes and reverts the working directory
  • stash list prints all saves…
    • stash@{0} number in the curly braces {} is the index
    • stash show -p <index> show files changed with diff-style patch
  • stash apply <index> applies the changes and leaves a copy in the stash
  • stash pop [<index>] applies the changes and removes the files from the stash
  • stash drop <index> remove stashed changes without applying
  • git stash clear clear the entire stash

Branches

Commits made on branch currently “checked out”…

  • git status shows checked out branch
  • …files in .git/refs/heads (local), .git/refs/remotes (remote)

List branches…

# list branches in repository (* marks the current branch)
git branch

# list available emote branches
git branch -r

# list available local and remote branches
git branch -a

Create and check out a branch…

# create a new branch
git branch $name

# create new branch at commit (defaults to HEAD), and switch to it
git checkout -b $branch [$commit]

# switch to branch (update HEAD, index, and working tree)
git checkout $branch

# checkout remote branch
git checkout -b $branch $remote/$branch

Delete a local branch…

  • -d …only deletes the branch if fully merged in its upstream branch
  • -D …deletes the branch irrespective of its merged status
# delete branch
git branch -d $branch

# delete local copy of a remote branch
git branch -dr $remote/$branch

Delete remote branch…

git push $remote -d $branch             # delete remote branch

Tags

# list tags of remote repository
git ls-remote --tags $remote 

# fetch remote tags
git fetch

# create new tag  to commit (defaults to HEAD)
git tag $name [$commit]

# delete tag
git tag -d $name

# list local tags
git tag -l

# list specific tags
git tag -l $regex

# list local tags with commit message
git tag -n1 -l

# create new local tag
git tag -a $name -m $message

# tag specific commit
git tag -a $name $commit

# push local tag to remote repository
git push $remote $tag

# push all local tags to remote repository
git push --tags $name

# delete all local tags
git tag -l | xargs git tag -d

Merge vs. Rebase

  • A merge combines the local branch with the remote (branch)
    • Default merge behaviour is to perform a fast-forward
    • Commits without conflicts are simply absorbed into the branch
    • A conflict requires a merge commit
    • Disable fast-forward --no-ff to force every merge to produce a merge commit
  • rebase applies commits from current branch onto the head of the specified branch
    • “replaying” changes with new commits (hashes/timestamps)
    • Merge resolution is absorbed into the new commit
# merge into current HEAD
git merge $branch

# rebase HEAD onto branch
git rebase $branch

# avoid a fast-forward commit (modify your working copy)
git merge --no-commit --no-ff $branch   

# examine the staged changes
git diff --cached

# undo the merge
git merge --abort

Configuration

Customize the user configuration:

~/.gitconfig                                 # user configuration file
~/.gitignore_global                          # rules for ignoring files in every Git repository
git help config                              # configuration documentation
git config --list                            # dump configuration
git config --global <key> <value>            # set configuration
git config --global alias.<abr> '<command>'  # set command alias
git config --global http.proxy $proxy
git config --global https.proxy $proxy       # use a network proxy
git config --global --unset http.proxy
git config --global --unset https.proxy      # disable network proxy

User & Mail

Set username an mail address for all repositories (in ~/.gitconfig):

git config --global user.name "Your Name"
git config --global user.email mail@example.com

Repository specific (in .git/config):

# from the working tree
git config user.name "Your Name"
git config user.email mail@example.com

Aliases

git ls-files -t --exclude-per-directory=.gitignore --exclude-from=.git/info/exclude
                                          # list files
git log --pretty=format:"%C(yellow)%h%Cred%d %Creset%s%Cblue (%cn)" --decorate --numstat
                                          # show commits with a list of cahnges files
git log --pretty=format:"%C(yellow dim)%h%Creset %C(white dim)%cr%Creset ─ %s %C(blue dim)(%cn)%Creset"
                                          # list commt messages one by line

Usage

checkout

Undo modifications to file in the working…

# ...tree (by reading it back from the index)
git checkout -- path/to/file

Recover a deleted file…

# ...find the right commit ...check the history for the deleted file
git log -- path/to/file

# ...work with the last commit that still had the file
git checkout $hash -- path/to/file

reset

Unstage changes to file in the index (without touching the working tree)…

git reset path/to/file
  • git reset HEAD …discard staging area (all changes)
  • git reset --hard …discard non-commited changes
  • git reset --hard $hash …discard until specified commit

Clean History

Create a new orphan branch…

  • …first commit made on this new branch will have no parents
  • …it will be the root of a new history
git checkout --orphan orphan
git add -A
git commit -am "Commit history removed"

Delete the original master branch and rename the orphaned branch:

git branch -D master
git branch -m master

Update the remote repository …option -f required…

git push -f origin master

Network Proxies

…protocols for client-server communication…

Protocol Example Connection Address
https https://example.com/repository.git
ssh git@example.com/repository.git
git git://example.com/repository.git

…everyone proxied with different method…

Proxy the HTTP protocol…

  • …set environment variables HTTPS_PROXY and HTTP_PROXY
  • …to use an available HTTP proxy server

Proxy an SSH connection with…

  • …a custom GIT_SSH_COMMAND
  • …using the SSH option -J to configure a jump host
GIT_SSH_COMMAND="ssh -J PROXY_FQDN " git ...

Proxy the git:// protocol…

  • …over an SSH connection…
  • …using netcat on proxy node
# create a helper script with the proxy command
cat > gitproxy <<'EOF'
#!/bin/bash
exec ssh node.example.org nc "$@"
EOF
# make sure that is is executable
chmod +x gitproxy
# set an environment variable to use a proxy command
export GIT_PROXY_COMMAND=$PWD/gitproxy

git-annex

Why git-annex

  • backups
  • location tracking
  • data preservation

git annex add includes a file into the repository:

  • Checksums the file, the resulting hash will be used a new name
  • Copy the original file into the git repository .git/annex/objects using the hash-name
  • Create a link to the repository file with the original name within the git working tree

Git Annex Future Proofing
http://git-annex.branchable.com/future_proofing

Git Annex Special Remotes
http://git-annex.branchable.com/special_remotes

References

[1] The ultimate git merge vs rebase tutorial, Toby Fleming (2018)
https://tobywf.com/2018/01/the-ultimate-git-merge-vs-rebase-tutorial/

[2] The Git Parable, Tom Preston-Werner (2009)
http://tom.preston-werner.com/2009/05/19/the-git-parable.html

[3] Git Notes for Professionals
https://goalkicker.com/GitBook/

[4] How to teach Git, Rachel M. Carmena (2018)
https://rachelcarmena.github.io/2018/12/12/how-to-teach-git.html

[5] Pro Git 2nd Edition, Scott Chacon and Ben Straub (2014)
https://git-scm.com/book/en/v2

[6] Gitea, self-hosted Git service
https://gitea.io

[7] GitHub, commercial Git service (Microsoft)
https://github.com/

[8] GitLab, commercial Git service based on Open Source core
https://gitlab.com/

[9] A Visual Git Reference
http://marklodato.github.io/visual-git-guide/index-en.html