Git Essentials for BeginnersJanne Kemppainen |
This subject has been touched by many people before me but I still decided to give my input so that I can have a reference for the readers of this blog. The idea is to get you started fast with Git so in this post we'll go through the absolute basics and you'll learn by doing yourself.
What is a version control system?
If you are new to all of this a version control system (VCS) in short is a piece of software that keeps track of changes in text files such as source code. Binary files such as images are also supported but storing them inside a Git repository is not optimal if they change often because the changes will bloat the file history. Git is not the only VCS but it is the most widely used and the de facto standard.
In Git all the code is tracked in a repository which is basically a normal directory that contains some hidden files where Git keeps all the required information it needs. The repository keeps track of the history of the files inside it.
The same repository can be hosted on a remote server which is needed when you need collaborate with other users. Simplified, this means that after someone makes a change to their local repository they push their changes to the central repository for the other collaborators to fetch from.
How do I install Git?
If you are running Linux then the git command is probably already available on the terminal (if not use your package manager to install). On Mac you may also have Git installed with XCode. You can check this by opening a new terminal window (or Git Bash for Windows) and typing
git which will print the usage information if it is installed. If you don't have Git or you're on Windows download the Git installer from here and follow the install instructions.
After installation you need to tell Git who you are. This is done with the
git config command. You need to set your email address and your name in the Git configuration like this:
>> git config --global user.email "firstname.lastname@example.org" >> git config --global user.name "Your Name"
How to create a new Git repository?
Open a new terminal window (or open Git Bash for Windows) and create a new directory that you want to use for the repository:
>> mkdir myproject >> cd myproject
A word on the notation that I'm using here. When I'm using two greater than signs (
>>) at the start of a line it means that you should type the text after that to the terminal. If they are omitted then that is the expected output of the command or something that you need to type into the text editor.
Right now we just have a normal directory. To make it a Git repository run
>> git init Initialized empty Git repository in /Users/janne/myproject/.git/
If you list the directory contents you should see that there is now a hidden
.git directory inside the project:
>> ls -A .git
This is where Git keeps track of things. You don't need to know how it works internally but it's good to know that it exists. But that's really it, now you have a local Git repository.
Tracking files with Git
Naturally the first step is to create the files that you want to track. For this example I will be using the
vim text editor but you can use which ever other text editor you like such as
nano or Notepad (do I even need to say that using Microsoft Word won't work?). Vim and Nano are terminal editors so with them you can do everything on the command line. If you haven't heard of them before you'd probably better be using Nano as it is easier to learn, just replace
If you want a text editor with a graphical user interface or if you don't know how to use Nano I really recommend the free Visual Studio Code.
Using your editor of choice create a new file inside the project directory and add some content to it. For the sake of this example let's create a small Python script (you don't need to know Python, any text content will do):
>> vim hello.py def main(): print("Hello World!") if __name__ == "__main__": main()
Now we have an untracked file on our repository and Git is not yet aware of it. To see the current status of the repository use the
git status command:
>> git status On branch master No commits yet Untracked files: (use "git add <file>..." to include in what will be committed) hello.py nothing added to commit but untracked files present (use "git add" to track)
If you look closely you'll notice that Git actually tells us how to start tracking the file. So let's do what it asks us to do:
>> git add hello.py
git add command is used to stage the changes that we want to add to the repository. It accepts file and folder names as parameters and you can put multiple on the same line. You can use
git add . to stage all changes in the current directory.
Let's check the status again
>> git status On branch master No commits yet Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: hello.py
Git now shows that there are changes to be committed.
Our file has not been added to the file history yet so we need to create a new commit. Commits are like gradual changes that describe what has been added and what has been removed from the repository. Right now we only want to add a new file.
To create a new commit use the
git commit command like this:
>> git commit -m "Add hello.py" [master (root-commit) 60ee0f1] Add hello.py 1 file changed, 5 insertions(+) create mode 100644 hello.py
-m parameter is the commit message. Each commit needs to have a commit message which tells how and why the files have been changed.
If you forgot to give the
-m parameter you might already be on your way to Google and search for “how to exit vim” and you'd probably find this answer on Stack Overflow. To spare you from the trouble here are the instructions to exit vim:
- hit the ESC key to make sure that you are in the command mode
:qand hit enter
- if it didn't work because you have accidentally created unsaved changes type
:q!to force exit without saving
Let's check the status again:
>> git status On branch master nothing to commit, working tree clean
Congratulations! Now you have succesfully added a new file to your repository!
Let's add another file called
>> vim README.md # MyProject This is MyProject >> git add README.md >> git status On branch master Changes to be committed: (use "git reset HEAD <file>..." to unstage) new file: README.md >> git commit -m "Add README" [master 5a1bbe7] Add README 1 file changed, 2 insertions(+) create mode 100644 README.md
To recap how to commit changes:
- Add the modified file(s) to be staged with
git add <filename>or a directory with
git add <dirname>or everything in the current directory with
git add .
- (Optional) check that the correct files have been staged with
- Commit the files to the repository with
git commit -m "Your commit message"
See the history of the repository
Now that we have done two commits our repository has some history that we can check. This is done with the
git log command.
>> git log commit 5a1bbe7b1ac8c5872e833ee75afc1c45eea47e6e (HEAD -> master) Author: Janne Kemppainen <email@example.com> Date: Sun Mar 10 12:33:04 2019 +0200 Add README commit 60ee0f1952b3fbc4e449317fcead3826b0dc4f12 Author: Janne Kemppainen <firstname.lastname@example.org> Date: Sun Mar 10 12:11:30 2019 +0200 Add hello.py
The log shows the commits that we have made to our repository. Each entry contains a unique hash so that the commits can be identified later and you can even check in which commit a specific line was changed. Then there are also the author of the commit and the date the commit was created. Finally there is the commit message that should describe what it is about.
To navigate up and down on the history use
k or the arrow keys. Quit by pressing
q. To search something in the history type
/, then what you want to search for and press enter. You can navigate to the next occurrence with
n and to the previous one by typing
Making changes to already existing files
Let's say that instead of saying hello to the whole world we would just like to greet the Internet. Open the
hello.py file and change the print line to:
Let's also edit the
README.md file. Change the content to whatever you like, for example:
# MyProject This is MyProject. There are many like it but this one is mine.
Save the files and check the
On branch master Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: README.md modified: hello.py no changes added to commit (use "git add" and/or "git commit -a")
Now git says that some files that it is tracking have been modified. But what if you don't remember what you have actually done after the last commit? Here comes
git diff to the rescue! Check the changes after the last commit (or the last time that you have staged files) with:
>> git diff diff --git a/README.md b/README.md index bc64684..d0f4993 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,2 @@ # MyProject -This is MyProject +This is MyProject. There are many like it but this one is mine. diff --git a/hello.py b/hello.py index ea7235e..0aa0fc8 100644 --- a/hello.py +++ b/hello.py @@ -1,5 +1,5 @@ def main(): - print("Hello World!") + print("Greetings, Internet!") if __name__ == "__main__": main()
You can navigate the diff in the same way as the
git log with the
k or the arrow keys but ours is probably so short that it fits without the need to scroll. Again, exit with
Every line that has been changed is shown here. The lines marked with minus signs and red color are the original versions that will be removed and those with the plus sign and green color are the ones that will be replacing the old content or added.
git diff marks the inner workings of Git quite well. The Git repository contains each and every state it has been in (single commits) but the commits only tell what has changed since the previous commit. Therefore the repository doesn't store snapshots of everything but only the changes that are required to change from one state to the other.
Try adding both of the files to the staging area with
git add . but don't commit yet. Instead call
git diff again:
>> git add . >> git diff
The diff is empty because all the changes have been staged.
Hopefully you weren't too eager to commit the changes yet. Let's say that after some thinking we thought that the changes that we made to the files aren't actually meaningful together so we want two separate commits. But we already added them to the staging area! How can we unstage the other file?!
Checking the Git status helps us here:
>> git status On branch master Changes to be committed: (use "git reset HEAD <file>..." to unstage) modified: README.md modified: hello.py
The status message actually tells us what to do:
>> git reset HEAD README.md Unstaged changes after reset: M README.md
Now the status is:
>> git status On branch master Changes to be committed: (use "git reset HEAD <file>..." to unstage) modified: hello.py Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: README.md
Now Git says clearly that only the
hello.py file is to be committed. Create a new commit:
>> git commit -m "Change the greeting" [master 98dcc14] Change the greeting 1 file changed, 1 insertion(+), 1 deletion(-)
Next create a commit for the readme file:
>> git add README.md >> git commit -m "Update README" [master dfd6995] Update README 1 file changed, 1 insertion(+), 1 deletion(-)
If you use
git log you should now see four commits.
Now having the content of the repository restricted to your own machine isn't that fun, right? Wouldn't it be nice to be able to collaborate with other people over the internet? Luckily there are many services that let you host your own Git repositories for free.
One of those services is called GitHub and to many it might even be synonymous with Git. Actually the reason that I didn't start by creating a repository on GitHub was to show to you that Git and GitHub are separate things. To continue with this tutorial now you need to head over to GitHub and create an account if you haven't done so already.
After logging in to your account create a new repository by clicking the “New” button next to your repositories list on the front page. Give it a descriptive name but don't check the “Initialize this repository with a README” because we have an existing repository that we want to use. Click “Create repository”.
GitHub now gives you the instructions to push an existing repository (use the url of your own repo):
>> git remote add origin https://github.com/jannekem/myproject.git >> git push -u origin master Enumerating objects: 12, done. Counting objects: 100% (12/12), done. Delta compression using up to 8 threads Compressing objects: 100% (10/10), done. Writing objects: 100% (12/12), 1.20 KiB | 1.20 MiB/s, done. Total 12 (delta 0), reused 0 (delta 0) To https://github.com/jannekem/myproject.git * [new branch] master -> master Branch 'master' set up to track remote branch 'master' from 'origin'.
This sets the GitHub repository as the origin and pushes the content over to the GitHub servers. The
-u origin master sets the GitHub repository as the default value when pushing or pulling changes and
master is the name of the branch. We will talk more about branches soon.
Git will ask for your GitHub credentials so just type in the account and password that you used when you signed up. Typing the credentials can get tiresome after a while so you can set up SSH keys with instructions from here.
What are branches and how do they work?
So I already mentioned branches. What are they exactly?
For the whole time we have been using only one branch called
master. This is analogous to the master recordings in music production from which the copies were made. This branch is the one that contains the “production” version of what ever we are storing in the repository.
From the master we can branch off development or feature branches which work exactly in the same way but we can have many of them simultaneously without them interfering with each other. This lets multiple people collaborate on the same project as they can do the development on their own branch and then finally merge the changes to master when they are done.
Let's say that we want to further develop our Python script by adding the current time to the print message. For this purpose we'll need to create a new branch so as to not interfere with other developers who may be working on the same repository. We know that we are already on the master branch and that it is up to date so we can start by creating a new branch with:
>> git checkout -b add_current_time Switched to a new branch 'add_current_time'
add_current_time is the name of the branch.
Now let's change the contents of
hello.py to the following:
import datetime def main(): print("Greetings, Internet! The current time is " + str(datetime.datetime.now())) if __name__ == "__main__": main()
Create a new commit normally
>> git add hello.py >> git commit -m "Add current time to the greeting" [add_current_time 8d9b6ae] Add current time to the greeting 1 file changed, 3 insertions(+), 1 deletion(-)
Next we can push our branch to GitHub
>> git push -u origin add_current_time Enumerating objects: 5, done. Counting objects: 100% (5/5), done. Delta compression using up to 8 threads Compressing objects: 100% (3/3), done. Writing objects: 100% (3/3), 415 bytes | 415.00 KiB/s, done. Total 3 (delta 0), reused 0 (delta 0) remote: remote: Create a pull request for 'add_current_time' on GitHub by visiting: remote: https://github.com/jannekem/myproject/pull/new/add_current_time remote: To https://github.com/jannekem/myproject.git * [new branch] add_current_time -> add_current_time Branch 'add_current_time' set up to track remote branch 'add_current_time' from 'origin'.
If you now go to your repository on GitHub you should see that there are two branches and GitHub suggests you to create a new pull request. Don't create one just yet.
We made some changes to the code so perhaps it is a good idea to update our README file too. Add something to describe the new functionality:
>> vim README.md # MyProject This is MyProject. There are many like it but this one is mine. ## What does it do? It greets and tells the time
Create another commit
>> git add README.md >> git commit -m "Update README"
This time we can push without specifying the branch because we already set the branch to track the remote ‘add_current_time’ from ‘origin’.
>> git push Enumerating objects: 5, done. Counting objects: 100% (5/5), done. Delta compression using up to 8 threads Compressing objects: 100% (3/3), done. Writing objects: 100% (3/3), 395 bytes | 395.00 KiB/s, done. Total 3 (delta 0), reused 0 (delta 0) To https://github.com/jannekem/myproject.git 8d9b6ae..312b98b add_current_time -> add_current_time
If you now select the add_current_time branch on GitHub you should see that the README file has also changed.
Git wouldn't be that useful if the code that you wrote in another branch couldn't be somehow moved to the master branch. There are two ways we can do this.
Create a pull request on GitHub
This the recommended option for teams. The workflow with pull requests goes like this:
- Create a new branch from master
- Do your development on that branch
- Create a pull request
- Other people give you feedback
- Make possible changes if requested
- Merge the pull request after it has been approved
- Checkout master branch
git checkout master
- Pull changes to the local repository
Of course if you are working alone then approving your own pull requests might not be that useful. But they are still a good way of keeping track of why things were made and when as an additional information over the git logs. They are a good way to keep track of issues as you can link to the issue tracker number on your PR.
Merge on the command line
If you don't care about pull requests you can always do the merge on the command line. The full workflow goes now like this:
- Create a new branch from master
git checkout -b add_current_time
- Do your development on that branch
- Checkout the master branch
git checkout master
- Pull changes from the remote server
- Merge the feature branch to master
git merge add_current_time
- Push changes to the remote server
Now choose one of the methods to merge the ‘add_current_time’ branch to master.
Cloning an existing repository
Quite often you are not the one who creates the repository, you want to edit the files on multiple machines or you initialized the repository alrady on GitHub. In this case instead of calling
git init you need to clone the repository to your local machine.
If you go to any public repository on GitHub you can see on the right side a big button that says “Clone or download”. Click it and you'll get the clone URL. After this cloning is as simple as calling
git clone on the command line:
>> git clone https://github.com/jannekem/myproject.git Cloning into 'myproject'... remote: Enumerating objects: 18, done. remote: Counting objects: 100% (18/18), done. remote: Compressing objects: 100% (16/16), done. remote: Total 18 (delta 1), reused 17 (delta 0), pack-reused 0 Unpacking objects: 100% (18/18), done.
In this case the content would be cloned to a directory called
myproject. After cloning you can start using the directory normally and even upload to the origin server if you have the proper rights for the repository.
In this post we went through the following:
- Installation and basic configuration
- Initializing a repository with
- Cloning a repository with
- Checking the repository status with
- Adding files to the staging area with
- Unstaging files with
git reset HEAD <filename>
- Seeing the commit history with
- Committing changes with
- Setting up remote repositories
- Creating a branch with
git checkout -b branch_name
- Changing to an existing branch with
git checkout branch_name
- Merging branches with pull requests or on the command line
This was by no means a be-all and end-all tutorial about Git but I hope that this was enough to get you started using Git. There are many features that I didn't cover here such as rebasing, tags or aliases. If you are hungry for more information I recommend you check out the Pro Git book which is freely available on the Git homepage.