Version control with Git - Part 1
by Martin Hronec
What is a version control
- version control is a method for tracking changes to source code
- from Wikipedia: "the management of changes to documents, computer programs, large web sites, and other collections of information."
-
it enables effective collaboration
- who wrote this part of code?
- when was this part of code last edited?
- why/when did it all stop working?
-
where do we see version control?
What is Git?
- in XKCD words:
- we will try to do it differently!
- understanding" instead of just memorizing
- we will approach Git bottom-up, i.e. starting with it's data model
- learning Git top-down (starting with its interface / command-line interface) can lead to a lot of confusion
Git
- a revision control system, a program to manage your source code history
- standard for version control
- can be used for:
- small projects with one developer
- huge projects with thousands of contributors * e.g. working remotely by hosting a code on public servers like GitHub, GitLab, etc.
- a free open source system
- available on Unix based platforms, Windows and macOS
-
has a distributed architecture
- each person contributing to a repository has full copy of the repository
-
majority of state-of-the-art software projects are under Git!
- explore directly on GitHub here
- stable-diffusion
- Python's pandas
- tensorflow
- editor Visual Studio Code
- kubernetes
- and much more!
- explore directly on GitHub here
-
note: what is GitHub?
- provides hosting for software development and version control using Git
- from an youtube comment on one of the Git tutorials: >Git is related to Github the same way porn is related to Pornhub.
-
created in 2005 by Linus Torvalds to help manage the task of developing the Linux kernel (a cool Git origin story)
Git installation & setup
- follow instructions
Command line interface
-
a shell is a user interface for access to an operating system's services
- it is named a shell because it is the outermost layer around the operating system
-
shells use
- command-line interface (CLI)
- commands (and syntax) knowledge necessery
- shell-specific scripting language (like bash)
- bash is a popular default shell on Linux and macOS
- bash is an acronym for Bourne Again Shell
- graphical user interface (GUI)
- easy-to-use with certain disadvantages
- command-line interface (CLI)
-
at its core, Git is a set of command line utility programs that are designed to execute on a Unix style command-line environment
- operating systems like Linux and macOX both include built-in Unix command line terminals
- Git bash is an application for Microsoft Windows environments which provides an emulation layer for a Git command line experience
- Git Bash is a package that installs Bash, some common bash utilities and Git on a Windows machine
- it has the same operations as a standard Bash experience
- Git bash comes included as part of the Git for Windows package
Bash basics
-
echo
returns whatever you type at the shell prompt -
pwd
prints the current working directory- the current working directory is the default place where the shell commands will look for data files
- the UNIX file hierarchy has a tree structure
- to reach a particular folder or file, we need to traverse certain paths within this tree structure
- paths separate every node of the above structure with the help of a slash character -
/
-
ls <directory>
to list contents of a directoryls <directory> -a
to list all hidden files
-
cd <directory>
changes the active directory to the path specifiedcd ..
changes to the parent directorycd
changes to the HOME directory
-
mkdir
creates a folder touch
creates an empty file-
rm
delete a filerm -r *
deletes a folder with all its contents
-
the redirection operator
>
will redirect the normal output initially intended to be on stdout and print it directly into the file.echo "Git is awesome." > file.txt
cat
reads a file and outputs its content
IDEs
- integrated development environment
- use some!
- standard features:
- text editor
- code completion
- linting
- debugger
- lot of options, but the best (IMHO) is hands down Visual Studio Code
Git's data model
gitGraph
commit id: "1"
commit id: "2"
branch nice_feature
checkout nice_feature
commit id: "3"
checkout main
commit id: "4"
checkout nice_feature
branch very_nice_feature
checkout very_nice_feature
commit id: "5"
checkout main
commit id: "6"
checkout nice_feature
commit id: "7"
checkout main
merge nice_feature id: "customID" tag: "customTag" type: REVERSE
checkout very_nice_feature
commit id: "8"
checkout main
commit id: "9"
- version control is a method for tracking changes to source code
- modelling history of a collection of files and folders within some top-level directory through relating snapshots
-
to be a Git repository, we need objects and system for naming those objects
- refs is the system for naming those objects
-
a history is a directed acyclic graph (DAG) of snapshots
- each snapshot refers to a set of "parents (previous snapshots)
- why set of "parent" snapshots instead of just one "parent" snapshot?
- each snapshot refers to a set of "parents (previous snapshots)
-
in Git's terms:
- these snapshots are called commits
- a file is called a blob (a bunch of bytes)
- a directory is called a tree
- maps names to blobs or trees (directories can contain other directories)
-
a "snapshot" is basically the top-level tree in your project
-
ASCII art for pedagogical purposes:
root_folder (tree)
|
+- folder (tree)
| |
| + file.txt (blob, contents = "A barber learns to shave by shaving fools.")
|
+- another_file.txt (blob, contents = "I don't give a git.")
Objects and content-addressing
- an "object" is a blob, tree, commit
- in Git data store, all objects are content-addressed by their SHA-1 hash
- moving to SHA-256, see collision
References
-
all snapshots can be identified by their SHA-1 hash
- inconvevient - try to remember strings of 40 hexadecimal characters
- Git's solution: readeble names for SHA-1 hashes - references
-
references are pointers to commits
- objects are immutable (building blocks)
-
references are mutable (can be updated to point to a new commit)
- e.g.
master
reference - typically pointing to the the main "codebase"
- e.g.
-
we want a notion of "where we currently are" (e.g. in the history of the project)
- when we take a new snapshot, we know what it is related to
- "where we are/what are we looking at" is a special reference called HEAD
Git repository
- a Git repository is the data objects and references
-
all Git stores are objects and references
- easy
-
all git commands map to some manipulation of the DAG which captures the relation between snapshots (history and development)
- e.g. by adding objects and adding/updating references
-
wherenever you're typing in any command, think about what manipulation the command is making to the underlying graph data structure
- similarly, if you're trying to make a particular kind of change to the DAG, there is probably a command corresponding to it
Initializing repositories
-
2 main options to create a Git repository:
git init
git clone https://github.com/martinhronec/version_control_IES.git
- we will talk about this option around Git remotes
-
git init
creates a new subdirectory named .git that contains all information for version control- converts an existing (unversioned) project
- initiates a new empty repository
- we will look inside later when we are more familiar with working concepts
Lifecycle of a file/blob
- each file in the working directory can be in one of two states:
- tracked
- files that were in the last snapshot - can be modified, unmodified or untracked
- untracked
- any files that weren't in the last snapshot are are not in the staging area (explained later)
- tracked
- a created file will not be automatically untracked until you add it
- when you clone a repository (instead of
git init
-ing it, all of your files will be tracked and unmodified
- when you clone a repository (instead of
Status of the files/repository
-
git status
shows which files are in which state -
create a new file, e.g. hello_world.txt
- hello_world.txt is untracked
- Git won't start tracking it until we explicitly tell it to
- to start tracking the hello_world.txt file, use
git add hello_world.txt
- run
git status
again to see that file is staged now
- run
- to see what is staged (what will commited/snapshotted next,
git diff --cached
orgit diff --staged
git diff
shows you the exact lines added and removed- can be used between different objects, e.g. branches as we will see later on
git diff
by itself doesn’t show all changes made since your last commit—only changes that are still unstaged.
Main sections of a Git project
Commiting the changes
git commit
as equivalent of "saving"- only staged things go to the commit
- typing only
git commit
will launch a selected (in config) file editor- write
git commit <file-to-commit> -m "Commit message"
if you want to skip the detailed message
- write
-
the commit shows some self-info
- which branch you committed to (master)
- branch is an "independent" line of development - more later
- what SHA-1 checksum it has
- how many files were changed (and statistics about lines added and removed in the commit)
- which branch you committed to (master)
-
only staged files are commited
- anything not staged is just chilling there modified
-
every commit (you make), is a snapshot of your project
- you can revert to it later
-
if you want to skip the staging area, just use `git commit -a -m "
" - compare what happened to the actions shown on the previous picture "File Status Lifecycle."
The commit history
git log
- with no arguments,
git log
lists the commits made in that repository in reverse chronological order. - it lists each commit with its SHA-1 checksum, the author's name and e-mail, the date written, and the commit message
- with no arguments,
- plenty of arguments with
git log
git log -p -2
shows the diff introduced in each commit,-2
limits the output to only th e last two entriesgit log --stat
shows some abbreviated stats for each commit, e.g. list of modified files, how many files were changed and how many lines in those files were added and removedgit log --pretty=oneline
,git log --graph
- in practice, developers typically use some nice graphing extension/utility, e.g.
Three trees of Git
- we've just seen the basic (committing) part of the Git workflow
-
try understand the Git's internal state management system first, everything should be easier later
-
the working directory is in sync with the local filesystem
- it represents the immediate changes made to content in files and directories
- can be seen using
git status
(or just by looking at the files and folders the way you are used to)
- the staging index is tracking working directory changes, that have been promoted with
git add
(to be stored in the next commit)- a complex internal caching mechanism hidden from the user
-
to accurately see the state of the staging index, use
git ls-files
- a debugging tool for inspecting the state of the staging index tree
-s
stands for the--stage
option
-
the commmit history
-
SHA-1 hash of the staging index is different than the one in the Commit History
- the git status shows differences between the commit history and the staging index.
- after adding file (including to the staging area, its hash changes
Example of Git's data model
-
after the first commit:
- Git checksums each subdirectory (only the root project directory in this case)
- stores objects (one file in our case) in the Git repository
- then it creates a commit object that has the metadata and a pointer to the root project tree so it can re-create that snapshot when needed
-
repository now contains 3 objects:
- one blob for the content of your file,
- one tree that lists the contents of the directory and specifies which file names are stored as which blobs
- one commit with the pointer to that root tree and all the commit metadata
- we will use git-graph to quickly show the underlying data model of our repository
- see the github page for the color reference
- e.g.
- note that this graph does not correspond to the above one (different hashes)
2-commits repository representation
- if you make some changes and commit againt, the next commit stores a pointer to the commit that came immediately before it
3-commits repository representation
Git - plumbing and porcelain
- originally Git was a toolkit for rather than a "user-friendly" VCS
- => it has a number of "low-level" subcommands
- low-level commands are generally called "plumbing' commands
-
more "user-friendly" commands are called "porcelain" commands
-
many of these commands aren't meant to be used manually on the command line
- rather used as building blocks for new tools and custom scripts
-
when you create a new Git repository, Git creates a .git directory
- this is where almost everything that Git stores and manipulates resides
- to back up or clone your repository, copy this single directory
-
now we will talk about what you can see in a repository
Git objects
-
Git is a content-addressable filesystem
- Git is a simple key-value data store
-
contents of the .git folder:
- config file contains your project-specific configuration options
- info directory keeps a global exclude file for ignored patterns that you don't want to track in a .gitignore file
- hooks directory contains your client-side or server-side hook scripts
- HEAD points to the branch you currently have checked out
- objects directory stores all the content for your database
- refs directory stores pointers into commit objects
- index file is where Git stores your staging area
Hashing
git hash-object
takes some data and stores it in your .git/objects directory (the object database) and gives you back the unique key that now refers to that data object- with
git init
, Git initializes the objects directory and creates a pack and info subdirectories in it, but there are no regular files - the output from the above command is a 40-character checksum hash
- this is a SHA-1 hash - a checksum of the content you're storing plus a header
-
in your objects directory, you can see that it now contains a file for that new content
- Git stores data as a single file per piece of content
-
once you have content in your object database, you can examine that content with the
git cat-file
- a Swiss army knife command for inspecting Git objects
-p
: first figure out the type of content, then display it appropriately-t
: tells you the object type of any object in Git given its SHA-1 key
- writing into file, saving it, hashing it and then saving new content and hashing it leads to object database having both versions of this new file
- we can delete local copy of the created file
- and use Git to retrieve it from the objects database
- remembering the SHA-1 key for each version of your file isn't practical
- plus you are not storing the filename in your system, just the content
- this object type is called a blob
Removing files
- not a "plumbing command"
-
to remove a file from Git, remove it from your tracked files
- more accurately: remove it from the staging area and commit
git rm <file>
stages the file's removal- next time you commit, the file will be gone and no longer tracked
-
another usecase:
- keep the file in the working tree, but remove it from the staging area (i.e. making it untracked)
git rm --cached <file>
-
there is no option to remove a file from only the working directory
-
use
git rm -n
for a "dry run" - shows files will be removed -
files, directories, and file-glob patterns can be passed to the
git rm
command- to remove a directory, use
git rm -r
which stands for "recursive"
- to remove a directory, use
-
git rm
does not remove branches (more when talking about branches) -
why not simply deleting the file/folder on the file-system level?
- you can and then stage the change and commit
git rm
is convenience
Moving files
-
Git doesn't explicitly track file movement
- if you rename a file in Git no metadata is stored in Git that tells it you renamed the file
-
to rename a file you can run
git mv <file_from> <file_to>
- you will see in
git status
that Git considers it a renamed file
- you will see in
- it is equivalent to running something like this:
mv <file_from> <file_to>
git rm <file_from>
git add <file_to>
Ignoring files - .gitignore
- when I have shown you git-graphs, you might have noticed the untracked files in my repository
- these were precisely the produced graphs we might wish to ignore
-
a class of files we do not want to add but also do not want it being shown as untracked
- e.g. data, log files, files from the build
-
set up a
.gitignore
file before you start your commiting spree -
some rules:
- blank lines or lines starting with # are ignored
- standard glob patterns work
- you can end patterns with a forward slash (/) to specify a directory
- you can negate a pattern by starting it with an exclamation point (!)
-
glob patterns are like simplified regular expressions that shells use
- An asterisk (*) matches zero or more characters
- [abc] matches any character inside the brackets (i.e. a, b, or c)
- A question mark matches a single character
- [0-9] matches any character betweent the start and the end
-
there are templates of
.gitignore
files available for most of the languages and types of the projects, see option on GitHub when creating a repository
Undoing changes
-
git log
to show historygit log --graph
-
let's discuss the available 'undo' Git strategies and commands
git reset
git revert
git checkout
- and more
-
Git does not have a traditional 'undo ' system
- careful, you can't always undo some of the undos!
Checkout
-
we've looked at the history and we've taken the file and modified it
- we can use the history besides reading the messages
-
git checkout
- allows you to do many things
- one thing it allows you to do is to move in your version history
-
give
git checkout <commit id>
(you can provide just the first couple of characters)- it will change the state of the current working directory to how it was at that commit
-
git checkout <commit hash>
shows the state of the repository at that specific commit - to go back to the "main" line of development:
git checkout main
Undoing a commit using git checkout
- the idea behind any VCS: store "safe" copies of a project
- to see the snapshot of repository in some previous time, use that snapshot's/commit's reference to visit it
git checkout <commit_id>
-
! when you checkout a previous commit, HEAD no longer points to a branch - it points directly to a commit
- this is called a detached HEAD state
- checking out a previous commit leads to HEAD no longer pointing to master
- this is called a detached HEAD state
-
in a detached state, any new commits you make will be orphaned, when you move HEAD back to the master
- orphaned commits are up for deletion by Git's garbage collector
- garbage collector runs on a configured interval and permanently destroys orphaned commits
-
from the detached HEAD state, you can create a new branch using
git checkout -b <master/branch_name>
- ! unfortunately, if you need the previous state of master, this strategy is not a very good idea
-
checking out an old file does not lead to movement of the HEAD pointer (avoiding detached HEAD state)
Reverting and reseting
git revert
- used for undoing changes to a repository's commit history
- instead of removing the commit from the history, it figures out how to invert the changes introduced by the commit and appends a new commit with the resulting inverse content
- prevents Git from losing history
- other 'undo' commands (e.g.
git checkout
andgit reset
) move the HEAD and branch ref pointers to a specified commit - Git revert also takes a specified commit, then:
- inverses the changes from that commit,
- creates a new "revert commit"
- the ref pointers are updated to point at the new revert commit
git reset
- a complex and versatile tool for undoing changes
- 3 primary forms of invocation
--soft
--mixed
--hard
-
these 3 forms correspond to the Git's internal state management mechanisms
- HEAD
- the staging index
- the working directory
-
at the surface level, the behaviour of
git reset
is similar togit checkout
git checkout
operates solely on the HEAD ref pointergit reset
will move the HEAD ref pointer and the current brnach ref pointer
-
the pointer to the commit history is always modified, arguments
--soft
,--mixed
and--hard
correspond to the ways we modify the staging index and working directory trees
git reset --hard
- the most direct and DANGEROUS option
- the commit history ref pointers are updated to the specified commit
- the staging index and the working directory are reset to match that of the specified commit
- any previously pending changes to the staging index and the working directory gets reset to match teh state of the commit tree
- any pending work not commited will be lost
git reset HEAD~2 --hard
to effectively remove the last two commits from the project history- NEVER PERFORM THE ABOVE OPERATION ON ALREADY PUBLISHED COMMITS (TO A SHARED REPOSITORY)
- this data loss cannot be undone!
- if you look at the objects database, removed commits will still be there, but only until next garbage collection by Git!
git reset --mixed
- the default opption
- it is the same as executing
git reset
- it is the same as executing
- the ref pointers are updated
- the staging index is reset to the state of the specified commit
- any changes that have been undone from the staging index are moved to the working directory
- compare this to the --hard reset case where we've lost these updates
git reset HEAD <file>
is used for unstaging a file sometimes, but safer to follow the Git recommendationgit restore --staged <staged file>
to remove from the stagine area
git reset --soft
-
the ref pointers are updated and the reset stops there
- the staging index and the working directory are left untouched
-
this behaviour can be hard to clearly demonstrate
-
if you execute soft reset with the default argument of the target commit (which is HEAD), by default nothing will change since the commit history was already sitting on HEAD
- we need to provide it with some historical commit id
-
possible usecases:
- useful if you committed to the wrong branch. all changes go back to staging area and move with you while you checkout the right branch
- the easiest way to squash that many commits would be to do
git reset --soft
and then commit
Don't reset public history
- don't use
git reset
when any snapshots have been pushed to a public repository- you have to assume that other developers are reliant upon it
Branches
-
branching means:
- diverging from the main line of development
- and continue to do work without affecting the main (stable/production) development line
- developing multiple things in parallel
- diverging from the main line of development
-
in other VCS tools, this requires a lot of resources
- e.g. to create a newcopy of your source code directory
-
not in Git! - very lightweight, killer feature
-
when you commit in Git, Git stores a commit object that contains a pointer to the snapshot of the content you staged:
- the author and message metadata
- and zero or more pointers to the commit or commits that were the direct parents of this commit:
- zero parents for the first commit
- one parent for a normal commit
- and multiple parents for a commit that results from a merge of two or more branches
-
you can view all commits across branches using
git log --branches=*
Creating a new branch
- a branch in Git is simply a lightweight movable pointer to one of the commits
- the default branch name in Git is or main
- originally master see story why it changed
- as you initially make commits, you're given a main branch that points to the last commit you made and every time you commit, it moves forward automatically
-
what happens when you create a new branch?
- it creates a new pointer for you to move around!
-
git branch <branch_name>
to create a new branch - look at the following 2 graphs to see that indeed, new branch is just another pointer pointing to the last commit
- added a new branch called "brave" with
git branch brave
- HEAD is still pointing to the master
- if we would add another commit now, we would be contributing to the main
HEAD
-
how does Git know what branch you're currently on?
- it keeps a special pointer called HEAD
- a pointer to the local branch you're currently on
- you can see it in the file .git/HEAD
- it keeps a special pointer called HEAD
-
for now we are still on the "master" branch, we did not yet switched to the "new branch"
-
to switch to another branch, you can use
git checkout <new_branch>
git status
will confirm that you are indeed looking at itgit log
will show HEAD pointing to the "new branch"
- if you change a file, stage it and commit it, your project history will diverge
Branching and merging
-
you've decided that you're going to work on issue #12
- issues are generally the way code development is organized and managed, see github manual
-
git checkout -b iss12
-
(look at the GitGraph)
-
now on the new branch
- create some new file or make some changes
- commit it and look at the graph again
- your project history has diverged
(graph after working on new branch)
-
a branch in Git is actually a simple file that contains the 40 character SHA-1 checksum of the commit it points to
- => branches are cheap to create and destroy
- creating a new branch is as quick and simple as writing 41 bytes to a file (40 characters and a newline).
-
now imagine you need to make a hotfix, to do this, you need to move away from the iss53 back to the main line of the development
-
to switch from the current branch to another, you need to commit your changes
- or stash them, which we will be talking about later
-
you switch back to master
- you create a new branch named "hotfix"
- you hotfix what you need
- then you need to merge it back to the master (after some tests)
(look at GitGraph after hotfixing)
Merging - fastforward
- branching is usefull only if we can combine things back in the end
git merge <branch_to_be_merged_into_where_HEAD_is_pointing>
- because the commit pointed to by the branch you merged in was directly upstream of the commit you’re on, Git moves the pointer forward
- when you try to merge one commit with a commit that can be reached by following the first commit’s history, Git simplifies things by moving the pointer forward because there is no divergent work to merge together—this is called a fast-forward.
- notice that Git itself tells us this was a fast-forwart type of the merge
- Git does the best job to automatically merge things, if not possible you need to do it manually or use
git mergetool
(GitGraph after merging the hotfix)
git branch -d hotfix
Merging - recursive (three-way merge)
- your development history has diverged from some older point
-
becausee the commit on the branch you’re on isn’t a direct ancestor of the branch you’re merging in, Git has to do some work
- in this case, Git does a simple three-way merge, using the two snapshots pointed to by the branch tips and the common ancestor of the two.
-
git checkout master
&git merge iss12
-
TODO: chart (three snapshots used in a typical merge)
-
instead of just moving the branch pointer forward, Git creates a new snapshot that results from this three-way merge and automatically creates a new commit that points to it
- this is referred to as a merge commit
- it is special by the fact that it has more than one parent
- this is referred to as a merge commit
-
git add <changed file>
git merge --continue
- if something goes wrong, e.g. there is a conflict, use
git merge --abort
- look at the git log to see the branched development
Merge conflicts - diverged paths
-
if you changed the same part of the same file differently in the two branches you-re merging together, Git won't be able to merge them cleanly
-
this is how you repository looks like
Merge conflicts -resolving
-
if we run
git merge branch
when there is a conflict we will get (something like) following messageCONFLICT (content): Merge conflict in index.html
Automatic merge failed; fix conflicts and then commit the result.
-
Git hasn’t automatically created a new merge commit
- it has paused the process while you resolve the conflict.
git status
to see which files are unmerged at any point after a merge conflict
-
open the conflicting file
<<<<<<<< HEAD:
......one version
=========
...... second version
>>>>>>>>> <conflicting branch name>
-
resolve each of the conflicted section in the conflicted file
-
run
git add
on each file- staging the file marks it as resolved in Git
-
run
git commit
to finalize the merge commit -
see the resulting graph on the next slide
- after resolving:
Merging jupyter notebooks
- jupyter notebooks are technically JSON documents -> merging in the terminal is painfull (because of all the metadata)
- you can uses JupyterLag-git or good IDE like VScode
Branching management
- Another useful option to figure out what state your branches are in is to filter this list to branches that you have or have not yet merged into the branch you’re currently on.
git branch --merged
&git branch --no-merged
- branches on this list without the * in front of them are generally fine to delete with
git branch -d
- if you have a branch with contents that are not merged in another branch yet, standard
git branch -d
will fail (error: The branch '' is not an ancestor of your current HEAD.
GitHub
- live features walkthrough, head to https://github.com/
Git remotes
-
use remotes to collaborate remotely
- remotes are versions of your project that are hosted on the Internet or network somewhere
-
.git folder represents an entire copy of the history, objects and the references and all the snapshots
-
to cooperate with other people, they also have the entire copy of the project
- your local copy is aware of other linked repositories
-
git remote
lists all remotes that Git is aware of git remote -v
to show the URL-
git remote add <shortname> <url>
- you can use
<shortname>
on the command line in lieu of the whole URL
- you can use
-
e.g.
git fetch <shortname>
fetches all the information that you don't yet have in your repository- it only pulls the data to your local repository - it doesn't automatically merge it with any of your work or modify what you're currently working on
- you have to merge it manually into your work
- if you have a branch set up to track a remote branch, you can use the git pull commmand to automatically fetch and then merge a remote branch into your current branch
- it only pulls the data to your local repository - it doesn't automatically merge it with any of your work or modify what you're currently working on
-
we will talk more about pushing and pulling from the remotes when talking about branches
git remote show origin
to see more information about a particular remotegit remote rename <old_shortname> <new_shortname>
-
git remote rm <shortname>
-
adding remote looks like this on a graph:
The protocols
-
Git can use 4 major network protocols to transfer data:
- local
- the "remote" repository is in another directory on the same host
git clone <path_to_repository_>.git
- Secure Shell (SSH)
- common transport protocol for Git
- Hypertext Transfer Protocol Secure (HTTPs)
- currently the most used protofocl for Git (smart HTTP version)
- the Git protocol
- local
-
HTTPs and SSH are protocols that you should be aware of
The HTTP/S protocol
- "smart" HTTP is able to intelligently negiotiate data transfer (similar to SSH)
- before "dumb" HTTP, which was read only mostly
- smart HTTP operates very similarly to the SSH or Git protocols but runs over standard HTTPS ports
-
can use various HTTP authentication mechanisms
- meaning it’s often easier on the user than something like SSH, since you can use things like username/password authentication rather than having to set up SSH keys.
-
the most popular way to use Git now, since it can be set up to both serve anonymously like the git:// protocol, and can also be pushed over with authentication and encryption like the SSH protocol
- instead of having to set up different URLs for these things, you can now use a single URL for both.
- If you try to push and the repository requires authentication (which it normally should), the server can prompt for a username and password.
- The same goes for read access.
The SSH protocol
- common transport protocol for Git
-
SSH is the only network-based protocol that you can easily read from and write to
- other 2 (Git and HTTP) are generally read-only
-
to clone a Git repo over SSH you can use
git clone ssh://user@server:project.git
- or if you use
git clone user@server:project.git
, Git assumes SSH
- or if you use
-
set it up to pair your local repos with the github following this tutorial
-
PROS:
- use it if you want authenticated write access to your repository over a network
-
easy to set up
-
secure (all data transfer is encrypted and authenticated
- efficient (data as compact as possible)
-
CONS:
- you can't serve anonymous access of your repository over it
- people must have access to your machine over SSH to access it (even in read-only capacity)
- you can't serve anonymous access of your repository over it
Working with remote branches
- references to the state of branches on your remote repositories
- they're local branches that you can't move
- they are moved automatidcally whenever you do any network communication
- they act as bookmarks to remind you where the brances on your remote repositores were the last time you connected to them
-
they have the form (remote)/(branch)
-
when you clone from the remote repository, Git automatically:
- names it origin
- pulls down all its data
- creates a pointer to where its master branch is
- names it origin/master locally
Pushing
- if you want to share a branch, you need to push it up to a remote (where you have a write access)
- local branches aren't automatically synchronized to the remotes
- you have to explicitly push the branches you want to share
- you can use private branches to do work you don't want to share
-
git push <remote> <branch>
- this is a shortcut for:
- Git automatically expands the branch name out to refs/heads/
:refs/heads/
- Git automatically expands the branch name out to refs/heads/
- alternatively
git push origin <branchname>:<branchname>
- this can be used to name remote branch differently than the local one
- this is a shortcut for:
-
when you do a
git fetch
, you don't automatically have local, editable copies of remote branches- you have only a pointer (which you can't modify)
- to merge remote stuff into your current working branch, you can run
git merge origin/<branchname>
Tracking branches
- to see the full list of remote references, use
git remote show origin
git fetch --all
updates your remote-tracking branches-
git branch -vv
lists local branches (with some info) and what each branch is tracking -
if there is a remote branch that you want to track, you can create a local branch and base it off the remote one with
git checkout -b <local_branch_name> origin/<remote_branch_name>
-
checking out a local branch from a rermote branch automatically creates what is called a tracking branch
- a local branch with a direct relationship to a remote branch
-
if you're on a tracking branch and type
git push
, Git automaticallly knows which server and branch to push to- same holds for
git pull
- same holds for
-
when you clone a repository, it generally automatically creates a master branch that tracks origin/master
- that's why
git push
andgit pull
work out of the box with no other arguments
- that's why
-
to delete a remote branch
git push <remote_name> --delete <remote_branch_name>
Pulling
git fetch
fetches all the changes on the server that you don't have yet, it will not modify your working directory at allgit pull
is basicallygit fetch
followed immediately withgit merge
Branching Workflows
- merging from one branch into another multiple times over a long period is generally easy to do
- => you can have several branches that are always open and that you use for different stages of your development cycle
- you can merge regularly from some of them into others
- e.g. only entirely stable code in the master branch
- paralel branch named "develop" for development and testing, which is merged into
- branch "develop" pulls in topics from so colled topic branches (short-lived branches)
-
in reality, we are talking about pointers moving up the line of commits we're making
- the stable branches are further down the line in your commity hisotry
- the bleeding-edge branches are farther up the history
- think about them as work silos, where sets of commits graduate to a more stable silo when they're fully tested
-
now that we know the basics, the building blocks, let's discuss the collaboration workflows
- read this