Episode 035: Blueprint Series #5 Revision Control and Git; Branches, Releases, Revisions
In this the fifth TEN7 Blueprint for Operations episode, we’re going to continue our deep dive into revision control itself and Git more specifically, with a focus on branches, releases and revisions.
Here's what we're discussing in this podcast:
- What Git is about
- Managing revision control with Git
- Live environment
- Staging environment
- Local environment
- The backbone of a good software development practice
- The jug of milk in the fridge
- Branches in Git
- What a branch is
- Moving code from branch to branch
- Tree structure
- Gitflow & GitHub
- Establish a higher level of organization
- Lullabot's Tugboat
- What constitutes a release
- Hot fixes
- Semantic versioning
IVAN STEGIC: Hey Everyone! You’re listening to the TEN7 Podcast, where we get together to talk about technology, business and the humans in it. I’m your host Ivan Stegic. Today we’re diving into how we use revision control, specifically Git at TEN7. We spent time talking about the basics of revision control and Git and why it’s so important to us in a previous episode of the Podcast. So, if you haven’t listen to the primer, find it on our website at ten7.com/blog. It’s episode 34. As a major component of TEN7’s Blueprint for Operations, Git provides the mechanism of how we not only collaborate on code, but how we review it, promote it and get it live. Joining me to continue our discussion from last time is Tess Flynn, our DevOps Engineer at TEN7. Tess, thanks for joining me one more time.
TESS FLYNN: Hello.
IVAN: So, the title of this Podcast refers to the words environments, branches, releases and versions. I’d like to get to talking about each of these ideas today, but I think we have to start by giving a quick recap of what Git does and why it’s important to us.
TESS: So, Git is a version control system. It isn’t the only version control system out there, but at the time of this recording, it is the most popular one. It is fast, it is decentralized and it is open source. So, you can use it for free on any project you’d like. What Git does is, it allows you to keep a running history of your software project. And, the reason why that’s important is because human beings are forgetful and busy, and we forget when something was changed, who changed it and what that change was. But Git does not forget. It remembers all these things. We can take those changes, and we can even push them to a remote server. where our team members can pull them back down and continue working on their software in parallel, so that people don’t overwrite the same changes by working in the same file consecutively.
IVAN: And why is it important again?
TESS: (laughter) Mostly because it forms the backbone of a good software development practices. It remembers stuff for us and it provides us the ability to distribute and share that code among multiple people and environments.
IVAN: That’s a good reason. (laughter)
TESS: That’s kind of useful, yeah.
IVAN: Yeah, it’s kind of useful. I agree. So, Git is something that we’ve talked about as being used by a developer, by a development team. We’ve talked about it being used on your own computer, something that we usually refer to as your "local environment" or your "local dev environment" and that kind of implies that there’s another environment, perhaps at the very least, a "production environment." We refer to those as "live environment" as well. That’s where we would be deploying our software where a user would be interacting with it. Why is this concept of environment so important?
TESS: It’s important because a lot of people seem to be operating under the notion that a website is a single instance thing. It’s like the jug of milk in your fridge. You don’t have another jug of milk, you can’t just copy it, it is the only jug of milk in your fridge. And, if you remove that jug of milk from your fridge, you are out of milk. That’s not really how websites work. That’s sometimes how software works. Instead, you could have as many jugs of milk as you want. You can take the same one and you could make another copy of it, and make another copy of it, and make another copy of it. Software is very good at being copiable. The problem is that when we’re actually dealing with something like a website, there needs to be a single canonical source of truth for where that website is. The website that all the users go to. The website that URL attach to it, that you advertise for it, all this other stuff. That’s the live site, and it makes perfect sense to have a live site. But then you start wondering, is there more? Would I need to copy this again for whatever reason? Well, the simplest model is to have a local environment and a live environment. A local environment usually refers to the copy of the website that is running on some developer's laptop. The reason why they have a complete copy is so they can do whatever they want to the site. They can completely break it if they have to, in order to build new stuff or fix bugs without affecting the live site. That way you have a natural isolation. Your live site is still up, no one is trying to monkey with it at the same time that someone is trying to write more software for it. But then there’s a third case that often shows up. Another copy. And this is what I usually call a shared environment. A shared environment can be a live site. I sometimes call live sites, shared environments, but more often than not a shared environment is also a stage environment. It is something that’s used that’s not live and not local. It is used to preview particular changes to software, to see what the next version of the site is going to be like, so that you can be assured that all those changes work on production like hardware, without having any difficulties.
IVAN: So that’s three environments that you’ve described. Your local development environment, the staged environment which is where you put your code after you’re done with it for yourself for others to review, but not necessarily live for the public, and then the live environment. There’s this idea of branches in Git, and there’s this relationship between branches and environments. So, I’d like to get to talking about that, but maybe we should talk about what a branch is in Git in the first place, before we can relate those two. So what’s a branch?
TESS: So, one thing that’s kind of a problem with software is, it’s basically just a bunch of text files. Now, have you ever read one of those choose your own adventure books?
IVAN: Oh, I love those. Lone Wolf was my best. I love that.
TESS: If you were to actually go through one of those books and write out the progress of every different path that you can take by choice in that book, you get something that’s analogous to a tree structure, right?
TESS: Now, software kind of needs the same thing. It’s not exactly as linear as that, which is a little bit difficult considering trees don’t seem linear at first. But, they do follow a natural progression from the start to the very leaf tips. Correct?
IVAN: You’ve got it. Yep.
TESS: What’s going on with different branches in Git? Sometimes we need to segment the software code. So, let’s go back to another analogy. Let’s say that you’re a writer and you’re writing a book. You open Scrivener in order to actually start working on your novel, and then you realize “I haven’t had my coffee yet.” You go and have your coffee. You’re thinking about your novel the entire time. You realize that it’s nine in the morning and you haven’t showered yet, so you take your shower. You sit there and think for awhile and then you have a brilliant idea for how you can change the direction of your book, but it’s going to require doing all of these different changes to your book. Changing how this character interacts with this character, and the decision made here and the detail there, and so you have to make tons and tons of changes. And, you want to keep your novel as it is now, but you also want to make a different version of the same novel that has all of these different changes in it, to see that it’s going to turn out the way that you want it to. Because you don’t want to commit to that change yet. You want to play it out and see if it actually works the way you think it does. When you think about these two different versions of basically the same novel, that’s kind of like how Git branches work. You have one line of code history, and we segmented that code history into a new line of code history. And, we could have different changes in each line of history. The nifty part about it is, like with your novel, you might decide “this new version is awesome. I love it. I’m going to overwrite my previous version.” When that happens, you’re going to do a merge, you’re merging in this one version of your novel into the previous one and making all of those changes permanent. The same thing happens with branches in Git. You have these two different versions of the software and at some point you might decide this branch has all the changes that we need, it works well, we like it, let’s merge that back into the other branch. Now why would you even have this? Well, it’s the same idea. You would have a primary branch that is considered the real version of your software, and you have a bunch of other stuff which could be different features or different fixes, or different ideas. There’s no real codified standard for what a branch should be, but there are several defined strategies by which you can branch.
IVAN: And one of the strategies we use has the names master, develop and feature branches, but there are other strategies as well, and maybe we should talk about those names – master and develop – and how they relate to the book example you just gave.
TESS: So a lot of this gets back to the branching strategy that’s called Gitflow. Now Gitflow usually assumes that there is a primary branch which is usually called the "master" branch. This is going to be whatever the current version of software you’re working on. So, whatever’s on your live site corresponds to the code that’s in the master branch.
IVAN: That would be the latest version, right?
TESS: It would be the latest released version.
IVAN: Good note.
TESS: Because, then we start talking about the develop branch, sometimes called just the Dev branch. The develop branch is used for the next version of development. So, whenever you make new changes, you’re going to put those on develop first, and then at some point in the future, those changes get merged into master, thus overwriting the novel that you were working on before.
IVAN: Of course. Now, there was some confusion when I was talking to a colleague the other day. Why do we call it the develop branch, but yet its related to something that’s prior to it being live?
TESS: I think it’s mostly because it’s the one that you’re developing. I’m not exactly sure where that term came from specifically. I haven’t ever bothered to look up the history of that.
IVAN: It would make sense that it would be the one that you’re developing. The one that’s growing. So, yeah, we should look that up. (laughter) Ok, so we’ve got the develop branch and the master branch and this is the strategy that we’ve taken. Let’s talk about how the environments that we were referring to earlier relate to these two branches and why we would relate them even.
TESS: The reason why we would have this kind of a branching strategy is so that we can correspond the master branch to the live environment, the one that users are visiting. And then our stage environment, we typically correspond to the develop branch. And the reason why develop corresponds to a staging environment, instead of the local environment for example, is because we need a shared environment to do reviews across our own teams, as well as reviews with clients.
IVAN: The reviews that happen with clients can happen in the staging environment, and if I’m working on a project by myself, working on a new feature one at a time in the shared environment develop branch, makes sense. Because I can do that, and I can commit locally, and I can test locally, and if something doesn’t work out very well, I can always revert. But I can continue working on the shared environment develop branch locally for as long as I want, and once I’m at a point where I’m happy, I can send that up to the server, and presumably the client would be able to see that in the shared environment that you referred to. That’s easy, but it’s probably going to be a problem if someone else works on a different feature, or even the same feature, with me. So, I suspect that we should probably talk about how we mitigate those issues.
TESS: The fundamental problem that a lot of developing teams run into with Git is if they only have one branch that they’re all working on, or even two branches, a live environment and a shared environment, master and develop. If everyone is committing work to develop, a lot of developers are going to like to work incrementally. They’ll make one little change and then they’ll commit it. That change might work, might not affect everything. But it could also break the entire site, because work is still in progress. So, the problem is that we need a higher level of organization to group these granular changes in, so that we don’t actually constantly break our shared environment. And this is where feature branches come from in Gitflow. With a feature branch, you basically take a copy of the develop branch at the time that you start working on the feature, you make a new branch, and you only commit your changes to that feature branch. This allows you to do all of the granular, incremental changes that developers like doing. Kind of like saving a text file routinely instead of waiting until you’re done writing the entire essay. Once you’re finished with that work, that branch is also shareable with other developers. They can also check that branch out locally and use that branch and check out what code you’re writing. When that code is ready to be added to develop, we can merge that branch back into develop, and everything’s fine. A nice clean, cohesive unit of work has been added back to the next version without affecting the status of the shared environment. At least that’s how it’s supposed to work on paper. (laughter)
IVAN: Yeah, in theory it sounds like it’s great because you and I have worked together on a feature. You’ve reviewed my code, I’ve looked at your code. We’re happy with the feature branch. We merge it into develop. We push it up to stage, and then the client finds an issue. And then we have to go through that whole process of fixing this issue and rolling back. So, this theoretical issue becomes a problem in reality. And, one of the things that we did at TEN7 to address that, was we created another environment that we called preview. And we had this a femoral branch called the preview branch into which we would merge feature branches as much and as often as we could, and that was the sanity check before we actually went to the shared environment develop branch and showed that to the client for approval. And, we did that out of necessity, because there really weren’t any other good tools to kind of have an environment for every single feature branch. Because that’s really where you’d want to be I think.
TESS: A lot of other systems actually will leverage each individual feature branch and automatically spit out a completely new environment for each one, segmented from the same time that they’re initially branched off from. And that can be really useful, but it requires a high degree of automation and a degree of technical expertise that might be overkill for smaller organizations, especially freelancers and individuals.
IVAN: And Tugboat from Lullabot comes to mind as one of those tools that does that. And, I think there are other numerous tools that address that issue as well. Ok. So in a perfect world we would have an environment for every feature branch that we’re working on, and we would have an environment for develop and of course the live development for master. Once we’re done working on the feature that we’re adding, how do we get code from my feature branch into, say, the develop branch?
TESS: So, usually we do something that Gitflow actually provides called a feature finish. So Gitflow is a command line extension for Git, in essence. It also is a literal implementation of this branching strategy. So, when we start a new feature branch we do git flow feature start and then the name of the feature branch. Likewise, when we’re done with a feature and we want to merge it into develop, we’ll do a git flow feature finish, and that will take that branch and merge it back into develop for us, deleting the original feature branch in the process.
IVAN: I always got nervous when those feature branches would get deleted. (laughter) Is there any reason to be nervous for that?
TESS: Not really. Gitflow is very, very well developed and very solid and I’ve never had a problem where a feature branch just gets completely lost.
IVAN: So, finishing a feature branch using Gitflow which is the extension, deletes the feature branch, no longer to be seen, but all of the code that we wrote is now in the develop branch. Would you say that constitutes a release?
TESS: No, because Gitflow has another concept that’s called a release.
IVAN: Ok. Let’s talk through that.
TESS: So, now that we’ve had all these features merged into develop, we’re ready to release a completely new version of software and put it in front of actual users. Now, what we want to do under the hood is we want to take the changes which are in the develop branch and move them to the master branch. Gitflow provides a mechanism to do this called a release, and it uses the same kind of noun/verb structure git flow release start. When we start a new release, it takes the current states of develop, and it’s going to branch off from that to create a new release branch internally. We don’t really need to know a lot of these details, but the idea is that we’re going to assign some kind of unique identifier to that particular release, usually a version number is pretty common. When we’re finished with that release, it gets merged into master, and then the original release branch gets deleted. Now we have all the changes that were in develop into master. And now we have all of our code on master.
IVAN: And that constitutes a release?
TESS: That constitutes a release. There’s also another process called a hot fix, which is really fascinating and doesn’t get enough attention.
IVAN: We should definitely talk about hot fixes.
TESS: Ok. Want to right now?
IVAN: Let’s do it.
TESS: One thing that happens in software is things don’t always go according to plan. (laughter)
IVAN: Sometimes. One thing that happens is it doesn’t go according to plan. Just sometimes. (laughter)
TESS: So, what happens is you might need to do a quick fix. You’ll discover that on your master branch, in your live environment, someone made a typo, and that typo is causing all kinds of problems. Now what we can do, if we had the correct access, is we could go into the live environment with SSH, make that type of change.
IVAN: No, no, no, no, no. Tess.
TESS: No. We don’t want to do that. And the reason why we don’t do that is because that’s called an out of band change. It’s something that’s no longer tracked in our history, so Git doesn’t know that it exists. And that’s not good.
TESS: But we don’t really want to make a completely new release, do we?
TESS: Because there’s another problem. You might think if you discover this hot fix immediately, the master branch and the develop branch have the same content. If you were to make a completely new release, adding it to develop first, and then merging that over to master, things are great right? The problem is that hot fixes don’t happen in a timely fashion. So imagine that you make a new release of software, and two months later, you discover this typo. All the while you have been making tons of changes to the develop branch. It no longer corresponds to what’s in the master branch. But, you still need to make this one change, and you need for that change to show up both in the develop branch and the master branch at the same time. It’s like you need to make a very specific insert that goes to both. This is what git flow hotfix start does. What it does is it branches from the master branch instead of develop. You make your one single change, and then you commit that change and finish the hotfix. What happens is that that hotfix not only creates a new version number, but it also takes that same change, applies it to both master and develop, but it doesn’t merge develop into master. You’re only merging that one tiny change. Hotfixes are really, really useful in a DevOps environment. If you have a DevOps engineer, they’ll spend a lot of their time doing hotfixes, particularly if the software code, the software repository also contains operation specific scripts and configurations. And it often will. And if it does, it makes perfect sense to use hotfixes for those operational changes, because you don’t want to create a new untested major version of your software. You only want to change the things that you specifically need to change.
IVAN: I’ve heard some people refer to hotfixes as not real releases, and I disagree with those people. If you’re one of those people, that’s fine, we can have that discussion. Do you consider a hotfix an actual release?
TESS: It certainly makes my heart pound just as much as doing a regular release. So yes. (laughter)
IVAN: So, maybe this is a good chance to actually talk about how we label releases and these version numbers that we associate with these labels. I know that in the past, Drupal 7 did not use semantic versioning and along the, kind of in the progress in the evolution of the development of Drupal 8, we decided as a community we would use semantic versioning for Drupal 8. We use our own modified version of semantic versioning at TEN7. Let’s talk about these labels of versions, and then maybe a little bit about how semantic versioning falls into the discussion.
TESS: So really we’re talking about version strategy. And, version number strategy is one of those bikeshedding topics that could cover entire years of collegiate education (laughter) and do absolutely nothing.
IVAN: I totally agree. We have to agree on something and move on.
TESS: I don’t really scribe to the notion that one version strategy is better than another version numbering strategy, but semantic versioning has a lot of advantages in the sense that it’s easy, it corresponds mostly to how software currently works, and how software developers like working. So, it tends to work fairly well for those reasons. It’s not universally applied. You could look at any number of different projects, like, the windows kernel has a completely different versioning strategy, but it works well for them, so, okay, sure.
IVAN: In semantic versioning there are three numbers, right? Number, dot, number, dot, number.
TESS: Sometimes four, but usually three is fairly common.
IVAN: Oh, I haven’t seen the four version.
TESS: Four is usually used for hotfixes I think. I can’t quite remember. But, we tend not to use that at TEN7.
IVAN: No we don’t. Should we just do a quick explanation of how we happen to use SemVer?
TESS: How it usually works is that there are three dot numbers, and you would be completely within your right to additionally think these three dot numbers are some kind of weird decimal format, and that it should go one, nine, nine, and then it should go...
IVAN: Two zero zero.
TESS: Two, zero, zero. No, that’s not how SemVer works. It’s a little bit better to think about it a bit more like commas if you’re an American. I don’t know exactly what you would think if you want to think this in European, German number formats, because that still breaks my heart. (laughter) All those commas. (laughter).
IVAN: Yeah right.
TESS: So, you have to really think about it as three separate instances of numbers. And, they have their own independent values from every number above it. It’s more like a pyramid than it is like a single individual decimal number. And when you think about it like that, it tends to make a little bit more sense. So, the first ordinal, that’s the left most one, is going to be the major version, and whatever the major version is, it usually starts at a one and then increments all the way up. All of these numbers tend to start at a one or a zero and start all the way up. The first number always starts as a one and then the other ones start at zero and then move up. Now, the first one is the major version. The next one is going to be, I forget what it was called. Was it minor version? I can’t quite remember off the top of my head. The terminology escapes me, but it’s another feature incompatible version of software, is kind of how that works. If I remember correctly.
IVAN: It's major version, is correct. I think the first number that you said. Then the second one is the minor version, and the last number is patch version. I think that’s the classic description of SemVer. But we’ve made some changes at TEN7 because we don’t typically use it as a patch on the third number. We use that as small enhancements, releases that are style changes or module updates, something that’s maybe a small feature that we’re adding.
TESS: Or operational change.
IVAN: Or operational change, yeah. We don’t usually just relate them constrain them to patches. And, the first version, the major number is what we use when we launch a brand new site. Or, we do a rebuild. It’s usually a very big change that we don’t change very often. And then, I think what I call the minor version, that’s the second ordinal, that’s a little more squishy, and I think we leave it as being squishy, because we want to give ourselves the flexibility to sometimes include a major feature, and sometimes to include a major milestone that might not be a feature, it might be a collection of features. It’s kind of dependent on the client I think.
TESS: Yeah, it’s not nearly as clear cut as say Drupal’s own interpretation of the semantic version standard. There’s a lot more squishiness in TEN7’s version, but as long as it’s used consistently, that’s the important part. And, you’re right, that the major version is usually only used for when we’re making a completely new version of a site, a rebuild of a site, releasing such a catastrophically different change that it might as well be considered a completely different project at that point. And sometimes even when we migrate that site from one set of hardware to another set of hardware, since operational code does exist in our repositories, that also is considered a catastrophic major change.
TESS: After that minor changes get a little bit squishier depending on what gets released and how, and the patch versions depend on what we’ve actually done. Usually it’s small enhancements, small fixes, things like that.
IVAN: And I think the point is to, like you eluded to at the beginning of the version numbering discussion, is not to stress over them too much. Choose something, use it, have a good understanding of what they mean and be flexible about it. I think it’s easy for one developer to increment the last part of the triplet, and likely the second part, but there’s usually discussion around the first part and in most cases the second part actually.
TESS: I’m a little bit of a stickler when it comes to this. I tend to like being a little bit more aggressive in version changes, because if I see 1 dot 0 dot 54 (laughter). It’s really easy for a lot of developers to get into that pattern and not want to commit to seeing anything as like a minor change, or even a major change.
IVAN: Yeah, I totally get that. You don’t see it as much anymore I don’t think though, 1.0.57. Coming back to the very beginning of this Podcast we talked about quickly what Git was and reminded you of what the purpose was. We talked a little bit about environments, live environment, staging environment, local environment. We even talked about feature branch environments. We talked about how those are related to branching in Git and we talked about what a branch actually was. And, most recently, we talked about how we get code from one branch into another, and then we had this discussion about versions. I think we’ve covered all of the items for Blueprint Episode 5. With that summary I would love to thank you Tess, for spending your time with me again, and I’m sure you’ll be back on the Podcast to discuss something very soon.
TESS: Hopefully. We’ll see.
IVAN: You’ve been listening to the TEN7 Podcast. Find us online at ten7.com/podcast. And if you have a second, do send us an email, we love hearing from you. Our email address is firstname.lastname@example.org. Until next time, this is Ivan Stegic. Thank you for listening.