Git Essentials for Beginners

Janne Kemppainen |

This subject has been touched by many people before me but I still decided to give my input so that I can have a reference for the readers of this blog. The idea is to get you started fast with Git so in this post we'll go through the absolute basics and you'll learn by doing yourself.

What is a version control system?

If you are new to all of this a version control system (VCS) in short is a piece of software that keeps track of changes in text files such as source code. Binary files such as images are also supported but storing them inside a Git repository is not optimal if they change often because the changes will bloat the file history. Git is not the only VCS but it is the most widely used and the de facto standard.

In Git all the code is tracked in a repository which is basically a normal directory that contains some hidden files where Git keeps all the required information it needs. The repository keeps track of the history of the files inside it.

The same repository can be hosted on a remote server which is needed when you need collaborate with other users. Simplified, this means that after someone makes a change to their local repository they push their changes to the central repository for the other collaborators to fetch from.

How do I install Git?

If you are running Linux then the git command is probably already available on the terminal (if not use your package manager to install). On Mac you may also have Git installed with XCode. You can check this by opening a new terminal window (or Git Bash for Windows) and typing git which will print the usage information if it is installed. If you don't have Git or you're on Windows download the Git installer from here and follow the install instructions.

After installation you need to tell Git who you are. This is done with the git config command. You need to set your email address and your name in the Git configuration like this:

>> git config --global user.email "youremail@example.com"
>> git config --global user.name "Your Name"

How to create a new Git repository?

Open a new terminal window (or open Git Bash for Windows) and create a new directory that you want to use for the repository:

>> mkdir myproject
>> cd myproject

A word on the notation that I'm using here. When I'm using two greater than signs (>>) at the start of a line it means that you should type the text after that to the terminal. If they are omitted then that is the expected output of the command or something that you need to type into the text editor.

Right now we just have a normal directory. To make it a Git repository run

>> git init
Initialized empty Git repository in /Users/janne/myproject/.git/

If you list the directory contents you should see that there is now a hidden .git directory inside the project:

>> ls -A
.git

This is where Git keeps track of things. You don't need to know how it works internally but it's good to know that it exists. But that's really it, now you have a local Git repository.

Tracking files with Git

Naturally the first step is to create the files that you want to track. For this example I will be using the vim text editor but you can use which ever other text editor you like such as nano or Notepad (do I even need to say that using Microsoft Word won't work?). Vim and Nano are terminal editors so with them you can do everything on the command line. If you haven't heard of them before you'd probably better be using Nano as it is easier to learn, just replace vim with nano.

If you want a text editor with a graphical user interface or if you don't know how to use Nano I really recommend the free Visual Studio Code.

Using your editor of choice create a new file inside the project directory and add some content to it. For the sake of this example let's create a small Python script (you don't need to know Python, any text content will do):

>> vim hello.py
def main():
    print("Hello World!")

if __name__ == "__main__":
    main()

Now we have an untracked file on our repository and Git is not yet aware of it. To see the current status of the repository use the git status command:

>> git status
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	hello.py

nothing added to commit but untracked files present (use "git add" to track)

If you look closely you'll notice that Git actually tells us how to start tracking the file. So let's do what it asks us to do:

>> git add hello.py

The git add command is used to stage the changes that we want to add to the repository. It accepts file and folder names as parameters and you can put multiple on the same line. You can use git add . to stage all changes in the current directory.

Let's check the status again

>> git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   hello.py

Git now shows that there are changes to be committed.

Our file has not been added to the file history yet so we need to create a new commit. Commits are like gradual changes that describe what has been added and what has been removed from the repository. Right now we only want to add a new file.

To create a new commit use the git commit command like this:

>> git commit -m "Add hello.py"
[master (root-commit) 60ee0f1] Add hello.py
 1 file changed, 5 insertions(+)
 create mode 100644 hello.py

The -m parameter is the commit message. Each commit needs to have a commit message which tells how and why the files have been changed.

If you forgot to give the -m parameter you might already be on your way to Google and search for “how to exit vim” and you'd probably find this answer on Stack Overflow. To spare you from the trouble here are the instructions to exit vim:

  1. hit the ESC key to make sure that you are in the command mode
  2. type :q and hit enter
  3. if it didn't work because you have accidentally created unsaved changes type :q! to force exit without saving

Let's check the status again:

>> git status
On branch master
nothing to commit, working tree clean  

Congratulations! Now you have succesfully added a new file to your repository!

Let's add another file called README.md

>> vim README.md
# MyProject
This is MyProject
>> git add README.md
>> git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	new file:   README.md
>> git commit -m "Add README"
[master 5a1bbe7] Add README
 1 file changed, 2 insertions(+)
 create mode 100644 README.md

To recap how to commit changes:

  1. Add the modified file(s) to be staged with git add <filename> or a directory with git add <dirname> or everything in the current directory with git add .
  2. (Optional) check that the correct files have been staged with git status
  3. Commit the files to the repository with git commit -m "Your commit message"

See the history of the repository

Now that we have done two commits our repository has some history that we can check. This is done with the git log command.

>> git log
commit 5a1bbe7b1ac8c5872e833ee75afc1c45eea47e6e (HEAD -> master)
Author: Janne Kemppainen <myemail@example.com>
Date:   Sun Mar 10 12:33:04 2019 +0200

    Add README

commit 60ee0f1952b3fbc4e449317fcead3826b0dc4f12
Author: Janne Kemppainen <myemail@example.com>
Date:   Sun Mar 10 12:11:30 2019 +0200

    Add hello.py

The log shows the commits that we have made to our repository. Each entry contains a unique hash so that the commits can be identified later and you can even check in which commit a specific line was changed. Then there are also the author of the commit and the date the commit was created. Finally there is the commit message that should describe what it is about.

To navigate up and down on the history use j and k or the arrow keys. Quit by pressing q. To search something in the history type /, then what you want to search for and press enter. You can navigate to the next occurrence with n and to the previous one by typing N.

Making changes to already existing files

Let's say that instead of saying hello to the whole world we would just like to greet the Internet. Open the hello.py file and change the print line to:

print("Greetings, Internet!")

Let's also edit the README.md file. Change the content to whatever you like, for example:

# MyProject
This is MyProject. There are many like it but this one is mine.

Save the files and check the git status:

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   README.md
	modified:   hello.py

no changes added to commit (use "git add" and/or "git commit -a")

Now git says that some files that it is tracking have been modified. But what if you don't remember what you have actually done after the last commit? Here comes git diff to the rescue! Check the changes after the last commit (or the last time that you have staged files) with:

>> git diff
diff --git a/README.md b/README.md
index bc64684..d0f4993 100644
--- a/README.md
+++ b/README.md
@@ -1,2 +1,2 @@
 # MyProject
-This is MyProject
+This is MyProject. There are many like it but this one is mine.
diff --git a/hello.py b/hello.py
index ea7235e..0aa0fc8 100644
--- a/hello.py
+++ b/hello.py
@@ -1,5 +1,5 @@
 def main():
-    print("Hello World!")
+    print("Greetings, Internet!")

 if __name__ == "__main__":
     main()

You can navigate the diff in the same way as the git log with the j and k or the arrow keys but ours is probably so short that it fits without the need to scroll. Again, exit with q.

Every line that has been changed is shown here. The lines marked with minus signs and red color are the original versions that will be removed and those with the plus sign and green color are the ones that will be replacing the old content or added.

The git diff marks the inner workings of Git quite well. The Git repository contains each and every state it has been in (single commits) but the commits only tell what has changed since the previous commit. Therefore the repository doesn't store snapshots of everything but only the changes that are required to change from one state to the other.

Try adding both of the files to the staging area with git add . but don't commit yet. Instead call git diff again:

>> git add .
>> git diff

The diff is empty because all the changes have been staged.

Unstaging files

Hopefully you weren't too eager to commit the changes yet. Let's say that after some thinking we thought that the changes that we made to the files aren't actually meaningful together so we want two separate commits. But we already added them to the staging area! How can we unstage the other file?!

Checking the Git status helps us here:

>> git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	modified:   README.md
	modified:   hello.py

The status message actually tells us what to do:

>> git reset HEAD README.md
Unstaged changes after reset:
M	README.md

Now the status is:

>> git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        modified:   hello.py

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   README.md

Now Git says clearly that only the hello.py file is to be committed. Create a new commit:

>> git commit -m "Change the greeting"
[master 98dcc14] Change the greeting
 1 file changed, 1 insertion(+), 1 deletion(-)

Next create a commit for the readme file:

>> git add README.md
>> git commit -m "Update README"
[master dfd6995] Update README
 1 file changed, 1 insertion(+), 1 deletion(-)

If you use git log you should now see four commits.

Remote repositories

Now having the content of the repository restricted to your own machine isn't that fun, right? Wouldn't it be nice to be able to collaborate with other people over the internet? Luckily there are many services that let you host your own Git repositories for free.

One of those services is called GitHub and to many it might even be synonymous with Git. Actually the reason that I didn't start by creating a repository on GitHub was to show to you that Git and GitHub are separate things. To continue with this tutorial now you need to head over to GitHub and create an account if you haven't done so already.

After logging in to your account create a new repository by clicking the “New” button next to your repositories list on the front page. Give it a descriptive name but don't check the “Initialize this repository with a README” because we have an existing repository that we want to use. Click “Create repository”.

GitHub now gives you the instructions to push an existing repository (use the url of your own repo):

>> git remote add origin https://github.com/jannekem/myproject.git
>> git push -u origin master
Enumerating objects: 12, done.
Counting objects: 100% (12/12), done.
Delta compression using up to 8 threads
Compressing objects: 100% (10/10), done.
Writing objects: 100% (12/12), 1.20 KiB | 1.20 MiB/s, done.
Total 12 (delta 0), reused 0 (delta 0)
To https://github.com/jannekem/myproject.git
 * [new branch]      master -> master
Branch 'master' set up to track remote branch 'master' from 'origin'.

This sets the GitHub repository as the origin and pushes the content over to the GitHub servers. The -u origin master sets the GitHub repository as the default value when pushing or pulling changes and master is the name of the branch. We will talk more about branches soon.

Git will ask for your GitHub credentials so just type in the account and password that you used when you signed up. Typing the credentials can get tiresome after a while so you can set up SSH keys with instructions from here.

What are branches and how do they work?

So I already mentioned branches. What are they exactly?

For the whole time we have been using only one branch called master. This is analogous to the master recordings in music production from which the copies were made. This branch is the one that contains the “production” version of what ever we are storing in the repository.

From the master we can branch off development or feature branches which work exactly in the same way but we can have many of them simultaneously without them interfering with each other. This lets multiple people collaborate on the same project as they can do the development on their own branch and then finally merge the changes to master when they are done.

Let's say that we want to further develop our Python script by adding the current time to the print message. For this purpose we'll need to create a new branch so as to not interfere with other developers who may be working on the same repository. We know that we are already on the master branch and that it is up to date so we can start by creating a new branch with:

>> git checkout -b add_current_time
Switched to a new branch 'add_current_time'

where add_current_time is the name of the branch.

Now let's change the contents of hello.py to the following:

import datetime

def main():
    print("Greetings, Internet! The current time is " + str(datetime.datetime.now()))

if __name__ == "__main__":
    main()

Create a new commit normally

>> git add hello.py
>> git commit -m "Add current time to the greeting"
[add_current_time 8d9b6ae] Add current time to the greeting
 1 file changed, 3 insertions(+), 1 deletion(-)

Next we can push our branch to GitHub

>> git push -u origin add_current_time
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 415 bytes | 415.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
remote:
remote: Create a pull request for 'add_current_time' on GitHub by visiting:
remote:      https://github.com/jannekem/myproject/pull/new/add_current_time
remote:
To https://github.com/jannekem/myproject.git
 * [new branch]      add_current_time -> add_current_time
Branch 'add_current_time' set up to track remote branch 'add_current_time' from 'origin'.

If you now go to your repository on GitHub you should see that there are two branches and GitHub suggests you to create a new pull request. Don't create one just yet.

We made some changes to the code so perhaps it is a good idea to update our README file too. Add something to describe the new functionality:

>> vim README.md
# MyProject
This is MyProject. There are many like it but this one is mine.

## What does it do?
It greets and tells the time

Create another commit

>> git add README.md
>> git commit -m "Update README"

This time we can push without specifying the branch because we already set the branch to track the remote ‘add_current_time’ from ‘origin’.

>> git push
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 395 bytes | 395.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://github.com/jannekem/myproject.git
   8d9b6ae..312b98b  add_current_time -> add_current_time

If you now select the add_current_time branch on GitHub you should see that the README file has also changed.

Merging branches

Git wouldn't be that useful if the code that you wrote in another branch couldn't be somehow moved to the master branch. There are two ways we can do this.

Create a pull request on GitHub

This the recommended option for teams. The workflow with pull requests goes like this:

  1. Create a new branch from master
  2. Do your development on that branch
  3. Create a pull request
  4. Other people give you feedback
  5. Make possible changes if requested
  6. Merge the pull request after it has been approved
  7. Checkout master branch git checkout master
  8. Pull changes to the local repository git pull

Of course if you are working alone then approving your own pull requests might not be that useful. But they are still a good way of keeping track of why things were made and when as an additional information over the git logs. They are a good way to keep track of issues as you can link to the issue tracker number on your PR.

Merge on the command line

If you don't care about pull requests you can always do the merge on the command line. The full workflow goes now like this:

  1. Create a new branch from master git checkout -b add_current_time
  2. Do your development on that branch
  3. Checkout the master branch git checkout master
  4. Pull changes from the remote server git pull
  5. Merge the feature branch to master git merge add_current_time
  6. Push changes to the remote server git push

Now choose one of the methods to merge the ‘add_current_time’ branch to master.

Cloning an existing repository

Quite often you are not the one who creates the repository, you want to edit the files on multiple machines or you initialized the repository alrady on GitHub. In this case instead of calling git init you need to clone the repository to your local machine.

If you go to any public repository on GitHub you can see on the right side a big button that says “Clone or download”. Click it and you'll get the clone URL. After this cloning is as simple as calling git clone on the command line:

>> git clone https://github.com/jannekem/myproject.git
Cloning into 'myproject'...
remote: Enumerating objects: 18, done.
remote: Counting objects: 100% (18/18), done.
remote: Compressing objects: 100% (16/16), done.
remote: Total 18 (delta 1), reused 17 (delta 0), pack-reused 0
Unpacking objects: 100% (18/18), done.

In this case the content would be cloned to a directory called myproject. After cloning you can start using the directory normally and even upload to the origin server if you have the proper rights for the repository.

Summary

In this post we went through the following:

  1. Installation and basic configuration
  2. Initializing a repository with git init
  3. Cloning a repository with git clone
  4. Checking the repository status with git status
  5. Adding files to the staging area with git add
  6. Unstaging files with git reset HEAD <filename>
  7. Seeing the commit history with git log
  8. Committing changes with git commit
  9. Setting up remote repositories
  10. Creating a branch with git checkout -b branch_name
  11. Changing to an existing branch with git checkout branch_name
  12. Merging branches with pull requests or on the command line

This was by no means a be-all and end-all tutorial about Git but I hope that this was enough to get you started using Git. There are many features that I didn't cover here such as rebasing, tags or aliases. If you are hungry for more information I recommend you check out the Pro Git book which is freely available on the Git homepage.

Subscribe to my newsletter

What's new with PäksTech? Subscribe to receive occasional emails where I will sum up stuff that has happened at the blog and what may be coming next.

powered by TinyLetter | Privacy Policy