6 Git and Github
We mentioned the need for version control in our previous module on project management. This is a central topic for any kind of software development. In this lesson we will learn the most popular system for version control, git, and see how we can use it for managing the progress of a project. This simplest way to think of this workflow is that we introduce a kind of “undo”-button for the entire project, by saving the state of our various files onto a timeline that we can navigate as we wish. But version control is so much more: we can use git to work on different branches of ideas, that we can merge into the project in a controlled way, we can formalize collaboration via online repositories, and we will see how open source development works via forks and pull requests.
You will use git and GitHub to complete the assignments in the course, so it is important that you learn this material now!
Please note that we will just briefly review the contents of the video lessons below in the beginning of our lecture. This means that you should work with the video lessons before we start the session at 10:15. We will use the remaining time for further discussion of central topics, including Github Classroom, which is the system that we will use to deliver the assignments for the rest of the semester. Maybe we will have time for a secret, but very fun activity in the end…
Here are some slides that we will use for discussion in class: ban400-git-github.pdf.
We will follow this schedule in our lecture slot:
08:15 - 10:00 | Complete watching the videos if you have not finished by this date. The instructor is available in the auditorium from approximately 09:30 for discussion and support if you have any questions or errors that you would like to resolve. |
10:15 - 11:00 | Brief review of the material. Discussion of workflow in data science projects. |
11:15 - 12:00 | We introduce two important concepts for code contribution to open source projects (the fork and the pull request). We will also introduce Github Classroom which we will use to administer the assignments in the course. |
Our brief treatment of Git and Github is by no means complete. Below follow a few extra sources for further reading:
- Pro Git is a standard reference on the subject.
- Happy Git with R gives an R-specific introduction to Git, with special consideration to the possibility of integrating the basic features if Git directly into RStudio.
- Github Ultimate is a (paid) course at Udemy.com by Jason Taylor, that has served as inspiration for several of the videos for this seminar.
- This lecture at MIT provides a basic introduction to the inner workings of Git as well as many of the commands that we cover in our seminar.
6.1 Setup and basic commands
In the video player below we set up git and learn the basic commands. Note that we have simplified the setup process from earlier years (specifically, we no longer use a third party tool for conflict resolution), which means that some of these videos are somewhat amputated.
6.2 Branching and conflict resolution
We introduce some more advanced concepts for version control on your local computer:
6.3 Online repositories on Github
Finally, we move our repository online using GitHub and see how we can use online version control to complement our workflow with tools for collaboration and project management.