Version Control System (VCS)

So far there are 3 generations of Version Control Systems (VCS):
  1. File Locking Version Control System.
  2. Centralized Version Control System (CVCS).
  3. Distributed Version Control System (DVCS).
Currently we are at the 3rd generation, Distributed Version Control System. I will explain the concept for 3rd generation VCS later in this writeup.

File Locking Version Control System

These types of VCS works based on file locking mechanism. It can be seen as a share file location, user will than be given access to this file share. Whenever a user wants to work on a file he has been provided the file with exclusive locks which means that no other user can have the access to the same file until the lock has been released from it. Users must have to manually merge the changes with other users' updates on the same file. 


Examples of these systems are:
  • SCCS in 1970s.
  • RCS in 1980s.

Centralized Version Control System (CVCS)


The next wave of VCS was Centralized Version Control System. This generation solves the exclusive locking problem and allows users to concurrently access/update the files on their local machine. The files are kept in a centralized locations such as a Share or in a database. Each users can then request to have it's own local copies of the files. This means that users can update the files on their local machines concurrently. However, they must merge the changes with other users changes, before commiting back to the repository, although most of the time these merges are performed automatically by the version control systems, this is popularly known as "Merge Before Commit".



Examples of these systems are:
  • CVS in 1990s.
  • SourceSafe in 1990s.
  • SubVersion in 2000.
  • TFS in 2005.

Distributed Version Control System (DVCS)

In contrasts to CVS, DVCS introduced the idea of "Commit before Merge". The characterstic of the DVCS are the following:


You have a copy of not just the latest version but all the versions. Since you have a copy of everything on repo, you don't need a central repository, instead a repository is distributed to the local devices of every member. However to make your changes available for others you still need to commit it back to a centralized or public repository. Since everyone has a complete copy of every version, it doesn't matter if centralized system get destroyed. This also means that you can code while you are traveling or not connected to VCS.

Examples of these systems are:
  • BitKeeper in late 1990s.
  • Git in 2005.
  • Mercurial in 2005.
  • Github.com started in 2008 (Open Source Hosting Platform).
To understand how it works, we first need to understand what is DAG and how it works.

Directed Acyclic Graph (DAG)

As its name implies, a DAG is a graph which is not-cyclic in nature. Means if you follow the graph using nodes and lines you can't get back to where you started from.



It's A-Cyclic in nature as you can see directed lines from 7 to 5 to 3 is possible but from 3 to 7 is not possible. Which implies if you start traversing the DAG from node 7 upwords, you can't reach back to node 7 again. Some important terminologies of DAG are as follows:

Parent Node: A node with a directed path to one or more other nodes are called Parent Node.
Child Node: A node with a parent node is called a child node.
Leaf/Head Node: A parent node with no further child nodes. Offcourse a Leaf node can later on have child nodes.
Root Node: A node without any parent.
Branch Node: A node with more than one child.
Merge Node: A node with more than one parents.

Each node in a DAG represent a single Commit from a user. Each node also associate a hash with it. The hash is calculated using the elements in the node and used to validate the node for any changes, in case a single dot or space is changed in a specific node, this will end up with a new hash and hence considered as a new node.

So how does it gets maintained, consider the below example.

Consider we have DVCS system with a server and 2 team members are working on it. Mr. X and Mr Z. Currently the server has the following state.

Mr X and Mr Z both get a local copy by sending a pull request. Now both of them have the same structure downloaded to their local computers, with 1 being the current node.

Mr X and Mr Z both performed their changes and commit. Remember in DVCS commit doesn't mean commit to Central location, but it's a commit to their local repositories. Now the structure looks as follows on their respective machines.




Now Mr. Z delivers his changes first to server. The server will look like the following.

Later on when Mr. X decided to deliver his changes to server, first he need to download the current state of server since the current state of server is changed. After getting latest from server Mr. X local is now looks like below:

Mr. X now has to merge his changes with Node 3 (Which is Mr. Z's changes). After the merge the new status of DAG on Mr. X machine will look like below and he can than submit to server.


From the above you can see that DAG is maintained on server and at the individual computers of each members location and each member can revert back to any version number in history, with out the need of connecting to server. Once you download the files from server, it'll not only just downloads the version you want but it downloads everything, this is called Cloning of repository. In case of GIT the DAG information is kept inside .git folder.


Once cloned, you can manipulate and modify the DAG's locally. This means that the entire repository of a code base is distributed/cloned to different remote locations. Hence if the server crashes we are still safe. There are also no need to be connected to server to revert back to any previous changes or undo your already committed changes since it's being maintained locally. However in order to collaborate with other users you still have to submit your version to server and you might need to merge your code with others. Most of the time this merging is automatically done and you only have
to resolve any conflicts manually.

Below we represents some important commands which you can use to perform operations on DVCS like GIT. For a complete list, visit the GIT Docs.

  • git Init: Creates an empty GIT repository.
  • git Status: Show the status of the working tree.
  • git Log: Show commit logs.
  • git Add: To add the files to the index.
  • git Commit: Record changes to the repository.
  • git Fetch: Download objects and references from a repository.
  • git Pull: Fetch from and integrate with another repository or a local branch.
  • git Push: Update remote references with modified objects.
  • git Checkout: Switch between working trees, branches, versions.
  • git Clone: Clone a repository into a new directory or machine.
  • git Merge: Join branches together.
  • git Branch: List, create or delete branches.

I presonally would recommend below resources to learn more about GIT.




Comments

Popular posts from this blog

ABC of Blockchain