Tractorbeam: Our Open Source Multi-Tier Backup Solution
We discuss Tractorbeam, our open source multi-tier website backup solution that backs up the database and files to any S3-compatible storage provider.
Tess Flynn, TEN7's DevOps Engineer
- What is Tractorbeam?
- The beauty of multi-tier rolling backups
- Tractorbeam doesn’t care about what kind of site it’s backing up
- Different ways Tractorbeam can connect to a site
- Tractorbeam supports MySQL and MariaDB
- Why geographic backups are important (because Godzilla!)
- TEN7 Hosting clients get Kubernetes hosting and redundant backups
- Tractorbeam future plans
- How Flightdeck works with Tractorbeam
IVAN STEGIC: Hey Everyone! You’re listening to The TEN7 Podcast, where we get together every fortnight, and sometimes more often, to talk about technology, business, and the humans in it. I’m your host Ivan Stegic. Today I’d like to spend some time talking about Tractorbeam backup, an open source multi-tier website backup solution. It’s something that started out as Ansible back in 2017 when there really wasn’t a way to make a backup of a Drupal 8 site.
At that time, backup and migrate was only a Drupal 7 module, and there was a plan and work still being done on its Drupal 8 port. Now, Tractorbeam backup is a standalone Docker container that will make backups of pretty much any website on a regular schedule, and then store those archives in an S3 compatible bucket somewhere else, so that it's geographically disconnected from the site you’re backing up.
We built Tractorbeam because we needed it ourselves, and we use it and maintain it to this very day. It forms a part of our core business offering and powers each of our clients backups, both onsite and off-site. Whether our clients are hosted with TEN7 in our Kubernetes cloud at DigitalOcean, or whether they’re a client hosted at Pantheon or even platform.sh, Tractorbeam provides the regular offsite backup service that our clients rely on.
I’d like to cover the status quo for Tractorbeam as of the summer of 2020. We’ll remind everyone what Tractorbeam is, what exactly it does, what it supports, and what we are hoping to do with it moving forward.
And who better to have this discussion with on the podcast, is the person who has written the code in Tractorbeam backup, Tess Flynn, and she joins me today. Hey Tess, nice to have you on again.
TESS FLYNN: Hello.
IVAN: How are you doing today?
TESS: Hmm, I’m doing.
IVAN: You’re doing, we’re all doing aren’t we?
IVAN: [laughing] Well let’s start with the basics. What does Tractorbeam actually do?
TESS: So, Tractorbeam is a solution where you give it a YAML document that describes which sites that you want to backup. And then the container will read that YAML document, parse where the sources of data that it needs to backup, grab those sources, then it will push them to any number of destinations. Right now, Tractorbeam only supports S3 as a destination. But, it’s not just AWS S3, it’s any S3 compatible hosting provider. So, that will be even Mineo if you want to self-host it, or something like DigitalOcean spaces.
IVAN: So, fundamentally it makes regular backups of your websites and puts them somewhere else?
TESS: Mm hmm.
IVAN: And it’s free and it’s open source and it’s something we’ve supported and created. I guess the only thing you need to pay for then is the server that Tractorbeam runs on.
TESS: Well, also the S3 bucket.
IVAN: And the S3 bucket, and you could probably get away with $5.00 per month for the server at DigitalOcean for example, and $5.00 per month for the spaces backup.
TESS: One thing that differentiates Tractorbeam from some other backup solutions that have been very historically popular in Drupal such as say NodeSquirrel, which has now been decommissioned, is that it does multi-tier rolling backups, which means there’s a set number of backups that it’s going to perform per timescale. This way that you only have a certain amount of space taken.
Something like NodeSquirrel was a rolling backup solution, where it took a new backup and a new backup and a new backup and a new backup. And as a result eventually some years down the line you had a whole year and a half of weekly backups you had to go in and delete, because your account got filled up.
IVAN: And, I mean, that’s what software as a service is about, right? There needs to be some sort of dollars coming into justify the cost of the architecture that you’re providing to the user. When you said multi-tiered rolling backup, you’re talking about the fact that Tractorbeam by design will do daily, weekly, monthly, but will only keep a certain number of each of them?
TESS: That’s correct. So, for the daily backups we can keep seven copies by default, then for the weekly ones we keep four copies by default, and then for the yearly ones we keep 12 copies by default.
IVAN: That sounds like a really sane backup strategy.
TESS: The idea is that if you’re going to experience a fault, the most likely case is that it’s going to be one you’ll notice fairly shortly, within hours and days of the eventual outage or problem that you’re trying to troubleshoot. Anything beyond that is going to be less of a production concern and more of an archeology concern. Something that you might want to investigate for auditing purposes or research purposes, or simple fact finding purposes. In which case a rolling back solution is way too much data for you. You typically don’t need it because the value of a backup decreases the older it gets. It goes stale.
So, the idea here is that we make different timescales of backup. This backup is for the day, it’s only going to last for seven days before it gets deleted. This backup is going to be for the week, so we only have four of those so that we have on average one month.
IVAN: Does Tractorbeam support hourly backups?
TESS: You could configure it that way. The thing with Tractorbeam as a container, that’s kind of confusing for people, is that it doesn’t do the scheduling itself. You need to do the scheduling to run Tractorbeam. So the idea is, that if you have say, a regular cron process on a traditional Unix server, you could have a crontab entry which runs Docker to run the container. And Tractorbeam doesn’t care if that is run hourly, minutely, daily. Ideally daily, ideally, but it can be run at any point.
IVAN: Great. And Tractorbeam is free and open source. That’s another distinguishing factor, most of the backup solutions you see online go into some sort of a proprietary locked in tub of data, bucket of data, or their software is a service and so you have to pay for that, you don’t really have access to your backups.
TESS: Tractorbeam makes the exchange that you need to have a bit of technical knowledge in order to actually set it up. You need to know YAML. You need to know how to set up an S3 bucket. You need to know how to set up an S3 bucket key in order to access that bucket remotely. And then, once you have that, then you also need to set-up the container to run somewhere. Now, if you’re using say, Flight Deck Cluster, that’s actually built in and you can just define your backups right there within a few lines of YAML to backup an entire cluster.
IVAN: That’s amazing. So, we’ve kind of touched on what Tractorbeam is, and a little bit of the underlying technology. I’d like to go a little deeper into the technology and what it supports. What kind of website can I backup? We talked about Drupal and we’re a Drupal company, so it supports Drupal for sure, but it’s not just about the Drupal website.
TESS: Yeah, this is one thing that’s a little bit different about Tractorbeam as well, is that it doesn’t know about Drupal as a website. So, when you think about something like NodeSquirrel, it was a module and that module eventually became a prerequisite for backup and migrate, or a part of backup and migrate. So, you install the module then there you go, there’s your backup solution. Tractorbeam doesn’t run inside of Drupal. It doesn’t even know about Drupal. All it knows about is the infrastructure you’re configuring it to communicate with.
So, when you actually set up a Tractorbeam backup, you’re not telling it where your Drupal site is, you’re telling it where the file directory is. You’re telling it where the database is, and you’re telling it where the S3 bucket is, and that’s it.
IVAN: And so, you basically support any website that’s powered by files and any website that’s powered by MySQL or MariaDB?
IVAN: Okay, so this could potentially be used for Joomla, Wordpress or Expression Engine on anything that Tractorbeam can get access to?
TESS: That’s correct. It doesn’t care about the kind of site that it is.
IVAN: Okay, so, let’s talk about getting access to. So, Tractorbeam works in its own container which needs to run on a server somewhere, and so it needs access to the website it’s going to backup, and it does this via SSH?
TESS: Yeah, so there’s actually several different ways that it accesses different sites, depending on how you’re doing the backup. If we’re going to talk about a traditional single node server somewhere that’s running an attached disc, which would be like a linode node or DigitalOcean droplet, or anything like that.
IVAN: Or Bluehost shared account maybe.
TESS: A Bluehost shared account for example, yes. That’s going to be a regular file system connection. It’s going to be over SSH. It will do what’s called an Rsync internally in order to grab all the files, and then it pushes those files up to S3 somewhere. So the database it’s going to try and form a direct connection via whatever communication protocol it decides to use. For MySQL it will try to use MySQL encrypted. If it’s not encrypted it will use whatever you give it. And then it will take that and it will push that to an S3 bucket. It gets a little bit more complicated when we’re going to talk about integrations with other, more dedicated platform as service providers however.
IVAN: And by that you mean you’re specifically talking about sites whose business it is to host a particular kind of website, but that might not necessarily provide SSH access, and the two, let’s talk about them, that Tractorbeam supports, right now are Pantheon and platform.sh.
TESS: Mm hmm.
IVAN: So, how does it do the backups in that case?
TESS: So, both for Pantheon and platform.sh, both of those service providers provide their own command line application that allows them access to the underlying environment. This includes the files, the databases and so on. And that’s actually what we use. We use their own tools to access their environments to get this data. In the case of platform.sh, a lot of that ends up being Rsync underneath the covers and SSH. In the case of Pantheon, it’s a little bit more complicated. For Pantheon at least, when we back up a database, what we actually do is tell Pantheon to make the backup, and then we download it from the command line.
IVAN: Wow. And all of this is totally abstracted right? So, all you need to do is provide Tractorbeam with the fact that it’s either an SSH connection, or it’s a platform.sh site, or it’s a Pantheon site. You provide it the corresponding credentials and Tractorbeam takes care of everything underneath to do the backup?
IVAN: That’s wonderful. We don’t yet support things like Squarespace and Wix and Weebly and maybe Wordpress.com, do we?
TESS: No, we don’t. At the moment I don’t know if those platforms even offer a CLI, and if they do, we certainly haven’t integrated with them.
IVAN: And there’s no reason why we couldn’t integrate with other platform providers, or that someone else could write the integration and do a pull request on Github, as long as there is access, and there’s a way to use an API, or there’s a way to use a proprietary CLI that the platform provider may provide, and at that point we should be able to get to the files in database.
IVAN: Got it. This is great. Okay, let’s talk a little bit more about the details of the database that Tractorbeam can backup. So, right now, we support MySQL and MariaDB, in theory we could probably support any other type of database like PostgreSQL, is that right?
TESS: We could probably support Postgres, we’d have to update the container to add the necessary client in order to do that, but we could do it.
IVAN: And at that point, if we’re doing that then, or if someone else writes that and writes a pull request, we could put it into Tractorbeam. I mean at that point, is there a reason why we couldn’t support Microsoft SQL server?
TESS: That again depends on if we could get the client in order to do that. One thing that’s kind of difficult is that some of these providers are not going to want to provide an open source, repackageable, redistributable client. Oracle, for example, does not like giving out SQL*Plus the last time I looked.
IVAN: [laughing] No, I would imagine that that would not be the case. Pantheon already has backups, and platform.sh has the same thing right? So, why do we need Tractorbeam?
TESS: So, this is something that I learned way back in the day when I was actually doing consulting for enterprises, and I actually had the opportunity to visit an IBM mainframe lab in Texas, where they were actually building mainframes, and are still building mainframes. Mainframes aren’t these old dinosaurs that no one ever uses that only belongs in museums. You have already used one today, if you’ve used any ecommerce site.
TESS: We just don’t know it.
IVAN: [laughing] Yeah.
TESS: And the thing that was really enlightening is that they talked for a minute about their backup strategy. And their backup strategy was, we have hot local backups. We have short-term local backups. But then we also have geographic backups. And I’m like, Why would you need geographic backups? It’s like, well if you imagine any kind of disaster that occurs, a natural disaster, alien invasion, I don’t know, zombies coming back from the dead, who knows.
IVAN: [laughing] Pandemic? [laughing]
TESS: Look, I was trying to keep things light here man.
IVAN: [laughing] Sorry. Sorry.
TESS: As a result you might not be able to get to the facility in which you had your backup stored. For example, let’s just say that you ran a tape backup. Yes, tape backups, they still exist, they’re incredibly stable, they last a long fricking time. Let’s say that you have a tape backup of your server, and you put it in a security box in a bank. That bank just got stomped by Godzilla. Now you have a problem. You don’t have your backup anymore.
So, the idea behind a geographic backup is, it is unlikely that Godzilla is going to be in multiple cities at the same time, unless if your Godzilla Final Wars, but I won’t get into that. In that case you will have multiple copies of the same disc somewhere else spread around the entire world. As a result, you will still be able to get a backup by going to one of your other geographic fail overs, if the one that’s closest to you is no longer available.
IVAN: And so, if something happens to Pantheon's backups or platforms, say just backups, and you can’t get to them because you got locked out, or because the data center where your backups were stored went down, and you need your site up. Having your site backed up and available in a data center and geographically different than where Pantheon is or where platform.sh is, is a smart thing to do.
TESS: Mm hmm.
IVAN: I want to talk a little bit about how our backup system is set up at TEN7 for the clients we have. Some of TEN7’s clients are hosted on the Kubernetes infrastructure that we have at DigitalOcean, as I alluded to in the intro, but we also have clients at platform.sh and also at Pantheon. All of the clients have redundant backups of their sites that are geographically different than where they are hosted, and so Tractorbeam takes care of that. It pushes all the backups to DigitalOcean Spaces as a first tier, and then it also does the same thing with the backup but pushes it to AWS S3. Correct?
TESS: That’s correct. One of the backup types that we actually support is from S3 to S3, so you can actually replicate an entire bucket if you need to.
IVAN: Is that what we do?
TESS: Mm hmm.
IVAN: That’s awesome. Okay, and so, in addition to that we also have a third location and a third provider at Google Cloud, and so, not only do we have those backups set up for those two locations at DigitalOcean and AWS, but then we bring the data and the databases over to Google Cloud as well. And that’s actually where developers can get to the archives to restore them if they need a copy of a database, or if they need a copy of a file. That’s where we control the permissions and that makes it really easy for people to get into.
TESS: For better or worse a lot of people have a Google account.
IVAN: Yes, [laughing] that’s definitely the case. What are we planning for the future? I know I have a couple of ideas about what I would really like to do with Tractorbeam and we can kind of get into those. I kind of want to hear what you think. Where would you like to take Tractorbeam to in the future?
TESS: One of the problems with Tractorbeam is that it still requires some technical know-how in order to set it up, and that’s non-trivial. You have to do a bit of fact finding if you want to set up say Pantheon backups, you’re going to have to get an execution key in order to do that. And that requires you to go to Pantheon’s documentation, figure out what the execution key is, what it does, go and generate it, go back to Tractorbeam, figure out where in the YAML files to enter it and so on, and so on and so on. And yeah, you can figure this out.
I’ve tried very hard to give you as much documentation as I could on the Github page, but it still requires some technical know-how and some research to make it work, and that’s difficult. It would be really nice if there were a way that we could have this as a service where there would be a UI that you could log into, plug in the necessary credentials, and that would all be all you would need to do.
IVAN: Yeah, that would be really slick. A website where you could maybe connect to the S3 bucket that you’re going to put the backups into, determine where your actual website is. If it’s at Pantheon it will be specific to that. If it’s at Bluehost it would be specific to that. Bing, bang, boom, and Tractorbeam is configured in a droplet, or a linode somewhere, and you’re done. I love that idea. Can you get to it next week? [laughing] Okay, so that’s one idea.
Another idea I had is, I would love to see a DigitalOcean Marketplace app that basically does something, perhaps less technical than the first thing we talked about, but not as user friendly as what you just described. So, a marketplace app on DigitalOcean where you can quickly fire up a preconfigured server that has the Docker containers configured, and that has crontab already preconfigured. And maybe you would need to connect to the server and insert the additional configuration like where your site is, the database credentials, and so on.
TESS: Yeah, that would be useful.
IVAN: And also not expensive, right? Ten dollars a month at DigitalOcean could backup a great number of websites right? I mean, that’s 250 gigabytes of storage plus a terabyte of transfer. I mean, websites are not big, and they don’t have, like even if you multiply it by 30 to get the number of archived backups, you’re not going to exhaust that huge amount of space, are you?
TESS: Probably not. You’re going to have plenty of space left over with that as local storage. You are still replicating that S3 somewhere though.
IVAN: Yes you are. But ultimately it’s not going to cost you hundreds of dollars to do these backups?
IVAN: I would like to have Postgres support to Tractorbeam.
TESS: Oh, if that’s what you want to do we can look at it.
IVAN: Okay. So, I’ll cut a ticket for that. And then, have you ever looked at, or considered using the open source tool Rclone?
TESS: I’ve looked at it. The problem is that Rclone has some other shortcomings that have made it a little bit difficult to integrate into Tractorbeam. It also doesn’t necessarily run easily or install easily on the platforms that we use for our container.
IVAN: Oh, so Rclone actually needs to be at the platform. It’s not just that it’s in the container.
TESS: It’s a little bit more complicated than that.
IVAN: Okay, so Rclone is not something we could very easily add to Tractorbeam, so that we can support any of the multiple different endpoints that Rclone supports? That’s too bad. Can we get backups to a regular FTP or file server?
TESS: So, FTP is possible. We actually don’t support it because in my opinion, if all you have is FTP, you have a bigger problem.
IVAN: Yeah. Maybe I should’ve said SFTP.
TESS: Now, SFTP we do support out of the box.
IVAN: Oh, it does? So you can actually push an SFTP endpoint instead of an S3 endpoint?
TESS: Right, because if you’re using SFTP you also have SSH which means you could do Rsync.
IVAN: Yeah, okay. So, that’s good to know. So we support S3 and SFTP. Brilliant. What else? I think we’ve basically covered everything haven’t we? We talked about what Tractorbeam is. We’ve talked about the fact that it’s open source, that it does a multi-tier rolling backup solution that is geographically redundant. We talked about the fact that it’s Docker.
Oh, you know what we haven’t really spent a whole lot of time on, is Flight Deck? We kind of mentioned it in passing, right? Tractorbeam started off as a bunch of Ansible, and then eventually we migrated it to be a Docker container, and if you happen to be using Flight Deck, Tractorbeam is a Docker container that’s a part of Flight Deck.
TESS: What we’re really talking about is Flight Deck Cluster. So Flight Deck is a series of containers which is geared towards both local and production Drupal development, and we’ve been running it out in production for, geez, two years now?
IVAN: I think it’s two years now yeah.
TESS: The thing is that Flight Deck Cluster is designed to set up a Kubernetes cluster to run Flight Deck containers. You can have it set up to run only one site, but you can have it set up to run multiple sites if you’d like. There’s a lot of complexity in getting Flight Deck Cluster to work. Once you do have it working, however, you can actually tell it, “Yes, I would like a Tractorbeam container. Yes, I would like it to run at this time every day, week and month.” And then here is the configuration that we’re going to have it backup, be it on the cluster or on a resource that’s externally available off of the cluster.
IVAN: And so, Tractorbeam supports then running as a container using Docker Compose, and it supports running under Kubernetes. What else does it support? I think I was reading that it supports Docker Swarm as well, right?
TESS: It could in theory run in Swarm. There wouldn’t be any reason why it couldn’t. You would have to schedule it as a container using an external scheduler, because I don’t believe that Swarm does cron-like tasks out of the box. It’s been a long time since I’ve looked at Swarm, so I could be mis-remembering.
IVAN: Well, there’s a lot that you could do with Tractorbeam. I hope that this has provided a very short description of why it’s valuable, and I hope that it piqued the interest of our listeners, and hopefully if there are any developers out there that are interested in contributing, they can always find us online. I think we pretty much covered it all then. Don’t you think?
TESS: Yeah, I think so.
IVAN: Well, thank you for joining me on the podcast again Tess. It’s been lovely. Tractorbeam backup is an open source, multi-tier website backup solution. You could always find the latest version which now supports offsite backups from platform.sh and Pantheon at ten7.com/tractorbeam.
You’ve been listening to The TEN7 Podcast. Find us online at ten7.com/podcast. And if you have a second, do send us a message. We love hearing from you. Our email address is email@example.com. Until next time, this is Ivan Stegic. Thanks for listening.