Episode 051: Blueprint Series #6 Backup & Disaster Recovery

In this the sixth episode of TEN7's Blueprint for Operations series, Ivan is joined by Tess Flynn to discuss Backup and Disaster Recovery. Subscribe to the podcast.

Here's what we're discussing in this episode:

  • Backing up your Website
  • Recovering from that inevitable disaster
  • Protecting your organization's website
  • Why bother to do backups?
  • Life with Mac OS 7.6.3
  • Computer data is non-durable
  • The various levels of potential failure
  • Executing geographical level backups
  • Ya get what ya pay for
  • Backing up the minimum
  • SOLR config
  • Ansible, Puppet, Chef & Saltstack
  • The slick leather jacket wearing spectacled hacker sitting at a keyboard
  • Drupal backup
  • Live, Stage and Test environments
  • Running cron jobs
  • The 3-2-1 Rule for backups
  • Bit rot
  • Multiple backup environments

TRANSCRIPT

IVAN STEGIC: Hey everyone you're listening to the TEN7 Podcast, where we get together every fortnight and sometimes more often, to talk about technology, business and the humans in it. I'm your host even Ivan Stegic. As part of a continuation of the TEN7 Blueprint for Operations Series, today we're diving into best practices for backing up your website and recovering from that inevitable disaster. We'll cover how we have things set up at TEN7, which I think works quite well for us, and we'll also talk about the things you can do to protect your organization's website. Joining me in our discussion is Tess Flynn, our DevOps Engineer. Tess, thanks for being on the podcast again.

TESS FLYNN: Always a pleasure.

IVAN: I love talking to you, especially on a beautiful afternoon like today. So, we keep hearing that backups are so important for mobile devices, for photos, for computers, I get it. Let's make copies of all the digital things we have, right. What about backups for websites? Why is it important to have a backup and a disaster recovery plan for your online presence?

TESS: Well, let's take a step back, and let's actually discuss what's really going on when it comes to personal data first, because some people might not necessarily understand the significance of that just yet. When it comes to the whole rigamarole of you should back up stuff, it's generally true, but most people don't understand why that's important. And, why that's important is because, digital information is inherently non-durable. And that sounds like a very cold and scientific way of describing it, but let's put this a different way. Most phones today, if they're not connected to some online service, are basically a piece of solid computronium. There's no way of cracking them open, getting the data out of them, or extracting anything from them, without a significant amount of forensic effort that is above and beyond what most consumers are capable of doing. So, for all intents and purposes, it's a black box, you can't open it. Now, if you're trying to get your jeep stuck out of the mud and your phone slips out of your pocket, hits the one rock that's in all of the mud around you and smashes into a million pieces and you can't boot it, you're done. There's nothing you can do. That data is gone forever without being able to crack open the case and extract the one IC that has the data on it. And, that's really difficult. Most people will basically just give up on that information. And, that could be very damaging for people. How often do you hear the story of someone's briefcase get stolen, and they actually don't care about getting their wallet back, getting their money back, getting their cash cards back. They're more concerned about “I need those papers, I need that data", because once that data is gone, I can't get it back. And, this is why a lot of personal devices nowadays have backups built in, because technology is inherently fragile, and the data that's within that technology is inherently non-durable. So, as a result, you basically have the wisps of fog dancing on the top of your castle of silicon glass, (laughing) and it only takes a little bit of life happenstance to blow that entire castle over, and all of that data wisps are gone forever.

IVAN: (laughing) We're lucky it's easy to make copies of those wisps of clouds.

TESS: (laughing) It is very good that we actually do have that. The fact is that a lot of consumer devices now, actually do all the backups for you. Half the reason why I have a different attitude towards backup and disaster recovery is that one of my first computers was actually a Mac Plus. Now that sounds like “wow, that's a really fancy computer to have.” Yes, in 1993, it necessarily wasn't all that fancy. (laughing) And it was definitely not very fancy when you were running Mac OS 7.6.3, because the software had moved so far beyond that particular bit of hardware, that it crashed every 15 minutes, easily. And so, I got into this nasty habit of always saving every few minutes, because I never knew when my system was going to crash. And then to make copies of those, because I didn't know if the hard drive was going to go out with it. So I have a unique perspective on computer data being very, very non-durable.

IVAN: Unique is a nice word of saying it. I would say a little paranoid, which I think is a compliment. You know, I think I feel the same way and if we both grew up in the same era of data being on drives, like that's the only place that it, is so make as many copies as you can afford to make.

TESS: Persistent storage devices? What’s that? What year computer is it, you know, a big heavy keyboard that plugs into a TV. (laughing) You unplug that sucker, it’s all gone. (laughing) So, it's pretty important to have backups because the data is really, really volatile. That's the wonderful thing and the not so wonderful thing about computers, is that the data is inherently very volatile. Now, when it comes to something that's a little bit more business critical like a website, that emphasis becomes much stronger because you do actually need to keep that data, because it's not just “oh well, I lost those videos,” “I lost those photos from my vacation,” “oh, well, I’ll get over it.” This is, "I am losing money, I am losing potential clients, I am losing brand perception", because my site is currently down. And this is doubly so if you are a web agency, because yea, if you can't keep your own site up (laughing). So, it's kind of important. Now, when it comes into doing this, the first thing that I usually like describing to people is, how many disasters can you potentially imagine that would take out your server. And, a lot of people will say, “well, someone could push a bad code update. Or a hacker could go in and deface the site.” Sure. That's kind of an inside disaster. And that's the one that most people from enterprise perspectives usually think of is, something inside the functioning of the site actually causes the site to no longer work correctly. But, the thing with backups is that there are many different levels of potential failure. That's only the smallest circle of potential failure. The larger circle of failure is the hardware itself that the site is working on will fail. Beyond that the data center in which that server runs, has some kind of failure. Beyond that, the physical area, the metropolis around which that data center exists, might have some kind of other disaster. It could be a power outage, for example, that could completely destroy the ability for the data center to continue to function. It could even be worse though, the entire geographic area around that data center could have a natural disaster or something else, that's going to cause that to have so many other more problems. And, as a result, it won't be able to function even more from there. So, there are multiple different levels. It was a revelation to me once when I went to an IBM lab in Dallas, and they were talking about geographic level backups and talking about how they had tapes and they hermetically sealed a vault, buried in geological strata that has been certified to be stable. I was like, “whoa.” (laughing). Now that that can be a little bit of an overkill for most sites. The way that I usually like talking about backup and disaster recovery for websites is, if you completely lose your hosting provider and everything that's on it, and you can never get it back again, how quickly can you rebuild your site with what you already have? And, that's usually my mental test case for how good is your backup.

IVAN: And what's the answer to that?

TESS: It also depends on the level of investment that you want to spend, because you can do some pretty amazing things with technology, but it can take a lot of work to implement that. I usually have a very pragmatic approach which is, it's best to have enough stuff available that you can rebuild your site on alternate hardware somewhere else within 24 hours. 24 hours is still a very long time for businesses, but it's not weeks, it's not months, it's not any of that. It's something that can, in fact, be done within a human capacity. And that's generally a good break even point depending on the size of your business.

IVAN: That's not bad. What if you have a commerce sites then you're probably a little more dependent, aren’t you?

TESS: You probably want to do something more like three hours at the most, because three hours is still a long time for a commerce site.

IVAN: And with a backup of a website what are we actually backing up?

TESS: So, this is a fun and nuanced question, because most people will tell you everything, and that's usually when I just have to face palm really, really hard, because everything is a lot of stuff. And the thing is if you backup everything, that usually implies to a lot of people that you're going to do a server level backup, that's going to backup literally the disk image on which your web server is running. And that's not something that's small. That can be up to 40-100 gig, even more, depending on the size of your file system. Then that's going to be your site files, your site uploads, your database, the entire operating system, any additional utilities necessary to build that operating system. All of that stuff, and it's all a lot of stuff. The thing is that some things are not as valuable to back up. This is kind of why I don't really tend to like consumer level backup mechanisms like Time Machine. I'm actually not a fan of those. A lot of people will say they're great. They back up about absolutely everything. Yeah, they do, which seems to be a bit too much for me. Mostly because a lot of the everything backup strategy also tends to get you large binary blobs of backups. It's the tape backup problem. If you just need one file out of a backup, but it's on a tape, you have to load the entire tape in order to get it. It's the same with a disk image, you have to load the entire disk image in order to get one file out of that. And depending on the size of the disk image and the nature of the technology used to snap that image, that can be non-trivial. What do we need to really take with a website, now that I've yammered on about this for three minutes? The bare minimum that you want to have is the database, especially for Drupal sites. The database contains the most critical nondurable data that goes around with the site. That's your content, your user accounts, your permissions, some site configuration. All of that stuff is critically important, and if you don't have that you don't have a site. Now, that's good, but if you just have the database, you don't really have a site. So what's the next step beyond that? The next ring around the database is going to be the site code. Now the site code is going to be anything that can be used to functionally get the site stood up and working again. So, that's going to usually be a Git repository. Somewhere between this is also going to be file uploads, because that's also nondurable data that can't be easily replicated. So you need the database, you need to file uploads, you need your site code. And that's going to be a great deal of stuff, and for Drupal 7 sites, that’s basically it. That's everything that you really do need to backup a site. Drupal 8 sites got a little bit more complicated than that. So with Drupal 8 sites, some of the configuration is no longer part of the database, it is actively stored in the database, but it's also staged to a file system. Usually that file system should be under Git control, but here's the thing about that. Sometimes you might have a backup that's not working very well, or you might have an incomplete record of changes, or someone might have gone on to the live site and changed the configuration, and now it's out of step with your Git repository. So, you have a number of different steps where this could potentially be out of sync with what was the last canonical version of the site. So, for Drupal 8 sites, we also have to capture the configuration as well as a separate backup entity. Then we get into a little bit more advanced things that depend on how quickly you want to recover. Usually, the next step I recommend is, you probably should capture the entire built site, not just the site code that's in your Git repository, but the actual files which are on the web server which constitute the running, working site. Now, for Drupal 7, it’s like, “isn't that the same thing,” it's like “yes, for the most part,” but not for Drupal 8. Drupal 8 tends to have a lot of additional dependencies, and some Drupal 7 sites have this as well. How about your theme? Your theme might not be completely compiled and stored within your Git repository. You might be using SAS, you might be using continuous integration, you might be building that as part of your deployment process. And now there's a piece of that data which is no longer in your Git repository. And the same goes with Drupal 8, because we get composer involved. Now, there are composer dependencies, those have checksums and versions, and all of this other critical information that we need to capture. So, we need to back that up as well. Usually I like recommending, you have your Git repo, you have your database, you have your file uploads, you have your sync directory, and you have all of the actual site files that were on the web server when the backup was taken. And, we want to keep all of those in separate bins, because if we glom them altogether, on some sites you'll have a 100 gigabyte tar.gz file easy file, and it's a nightmare to get anything out of it.

IVAN: So, separation is the key between all these different elements that you're backing up. I'm hearing database and files and repo and configuration. What about configuration for the infrastructure that you're running on? Something like, maybe, a SOLR config, or a memcached config. How do you deal with those?

TESS: That depends on how those systems were actually built. In a lot of traditional IT processes, those usually fall under what I would consider an infrastructure backup, which is a whole server backup. When you do a whole server backup, you capture the entire operating system image, the entire disk image, all of the configuration files, all of the scratch data, everything. And that's not bad, but it's not very surgical. And as a result you have very, very big, heavy backups, which take some time to store and deploy and maintain. So, what I usually prefer, is that, when you build your servers, you should build them with the idea that they are disposable. We no longer want to have servers which are going to be a server that we name with one of those cutesy names. One of my first companies I worked at really liked Babylon 5 and named all of their servers after characters.

IVAN: Very cutesy. So, you never really knew what they did?

TESS: (laughing) You never really knew what they did and these were what I call huggable servers.

IVAN: (laughing) Aren’t all servers huggable, Tess?

TESS: I mean, but you know. You probably shouldn't. There's very various different kinds of metaphors for this. Another one that's popular is pets vs. livestock, which is a bit of a problematic metaphor, but it does kind of communicate the idea that a pet has a name, you love it, you care for it, it's very personable, it has a personality, but you don't tend to do the same with livestock. Now that's not necessarily true, I live in the Midwest, I know better than that. But you tend to systematize how you manage that a lot more. And in the larger operations, you don't really give individual names to each individual entity they have, they have unique identifiers of some sort, some number. And that's kind of how a lot of servers should be treated. We should be treating them as inherently volatile artifacts of some other source of truth. So, usually I use some kind of building mechanism. I like Ansible, but some like Puppet, or Chef, or Saltstack, or whatever different configuration management tool that you prefer, that is capable of taking a raw server, configuring it with the data, and then have the capability to reconfigure it as necessary. So that SOLR config that you're worried about, is actually somewhere in another Git repo, somewhere else, that's part of a job that gets built that configures a server. So, an IT person doesn’t log into the server with SSH, goes to the SOLRschema.xml file and makes a change, they go to a Git repo, and in that Git repo, they're going to make the change to a line of code there. And then they're going to push that repo up, and then some other autonomous process goes on and builds up the server and updates the configuration for you.

IVAN: So really what you're saying is, you need to be concerned about the artifacts that are directly connected to your website and less so the infrastructure, because really, you should be portable from a hosting perspective. If you decide you don't want to use a particular host anymore, you should be able to pick up your things and leave, and it should be really easy for you to do that. When you have a good backup, you just simply take the latest snapshot and move to another host. Ok. So we've talked about, kind of, what gets backed up. Should we talk about what we're actually protecting against? When we talk about backups for personal things, we're kind of just talking about the fact that we don't want to lose them because they might be a memory, or some important piece of data. What are we actually protecting against for a website backup?

TESS: So, for a website backup, a lot of the things that we're protecting against are going to be either lost business, or it's going to be a lost business perception. So, if it’s going to be a brochure site or an advertising site, it's going to be something that might not directly be a mechanism in creating a sale, but it might be conducive to creating a sale. So as a result, we'd want to make sure that we maintain that particular operation. For a commerce site, it is directly responsible for making income. So, we want to preserve that as a particular operation of that site. So everything that we do with building the backup revolves around that.

IVAN: What about getting hacked?

TESS: Oh, if you want to talk about how many different ways backups can fail and how sites can fail, we'll be here all day. (laughing) Yes, hacks are definitely one of those, one of those ways, because the internet is a public restroom.

IVAN: Pretty much.

TESS: And, it's best to perceive it that way. I usually like telling people, and I have told people at various camps and cons, that the Hollywood image of some slick leather jacket wearing spectacled hacker sitting at a keyboard, sloughing down wine and listening to techno music, while their hands flail against the keyboard, is not really real. That's not really how it works. When we're talking about hacks today, a lot of the time it's going to be foreign governments, it’s going to be cartels, it's going to be other corporations, hostile actors, trolls, various other individuals, who all have their particular motivation, be it monetary or political or whatever. And they don't need to be at the keyboard to actually do this; all they have to do is raise an army of autonomous hacking agents and they go out and just see if they can destroy things. And, if they find a vulnerability, they'll keep using that vulnerability to destroy your site again, and again and again. So, you have to be constantly on guard, for your site will probably just be hacked someday. And as a result, you need to have backups, because if your site does get hacked, your next question is, “okay, when's our last backup? Oh, it was last night before it was hacked, and we haven't updated it since.” “Ok, we’ll just refresh the backup. We'll be fine.”

IVAN: It sounds like an ideal world to have that scenario. Ok, so we've talked about what we're backing up, what we're protecting against. I'd like to talk a little bit about how we have it set up at TEN7. I'm a little biased. I think we have a pretty sweet setup and all of that credit goes to you for setting that up. And, we've done, I think, a good job of communicating it and formalizing it. Let's start by talking about the regular backups that occur any time a developer does a push. How is the backup set up in our daily workflow?

TESS: So, we have an Ansible role that's called Drupal backup, and what it does is, it's going to have a bunch of configuration paths to it, but it determines if it needs to back up the database, if it needs to back up the sync directory that contains the configuration, if it needs to back up the actual site files themselves, and if it needs to back up the file uploads directory. Now, for builds, we actually don't backup the file uploads directory. We have that set up in a different way than a typical Drupal site, we have it externalized from the rest of the site, so it remains Sarco-synched between different builds, and we just link back to that directory when we're done with it. So, let's put that on the shelf for a later discussion. But, we do actually backup the database and also the site code itself on the web server before every build, and for Drupal 8 sites we also backup the configuration sync directory. So, we back these three things up every single time we do a build, which means we're going to have a directory which contains the database backups, the site files backups and also the configuration sync backups. And, the motivation is, if we ever have a bad build or a feature which causes problems, or any of this, we can immediately fall back to the previous build. But we also have one more step beyond that. The mechanism by which we do a deploy, has a little trick up its sleeve. Due to the way that Linux file systems tend to work, renaming a directory is actually a really fast operation. And Apache the web server doesn't really care if the directory that you're looking at changes suddenly, as long as it has the same name as before. So, what we actually do is, every time we build a new version of a site, we actually build that in a different directory, and then rename it to the one that Apache expects and rename the old one to a different name. And we keep them both.

IVAN: I've always wondered if there was a risk that there would be a request at the time of that rename, but I've always assumed that it's such a fast operation that the risk is very, very small.

TESS: If it does happen, it's going to happen for a handful of users over the course of a few nanoseconds of a window. So, it can happen, and I've seen it happen, once in a while, but it's so tiny of a window, because that operation is so quick that most people never even notice, they'll just reload the page again.

IVAN: Right. Now you said that these backups happen on every build. Does that mean it happens on every build that gets executed in every branch, for every push that a developer does? Or does that mean something else?

TESS: So that depends. For our standard workflow we have a live environment, a stage environment and a test environment. And, anytime anybody does a push, we automatically backup that entire environment. Arguably, we don't need to do that, for say, the test environment, because that environment is considered volatile by default. It's not a bad idea to back up the stage environment, but we definitely want to up backup the live environment.

IVAN: And that happens with hosting that we are providing to ourselves into and to our clients, right? What happens with something like Pantheon?

TESS: So, right now, I'm not sure how that's going to look. I'm still looking into how that workflow would apply to Pantheon. I would like to use the same mechanism that we used before but we'll have to see.

IVAN: I think there is an opportunity here for us to do some work with Drew and his team over at Pantheon; maybe there's some collaboration that we can set up. So, we’re talking about when backups happen in the repos, when do backups happen on the live environment, and how do they happen when we're not doing a push or a build?

TESS: So, when we're not doing a push or a build the mechanism is actually really kind of banal. We don't have a fancy mechanism that does this. We actually take advantage of a standard Unix operating system facility called cron. and cron lets you run a process on a regular basis according to a set schedule. So every time we do a build, we're also configuring cron, and we tell cron, run a backup every so often according to this particular site's configuration and save it in these directories. And that cron job goes and starts an Ansible script, and that Ansible script runs that same Ansible role that we use during a build, to create a backup.

IVAN: And that backup happens on the server where the live site is on schedule. So, nightly, for example.

TESS: Mm hmm, but we go one more step beyond that. We actually do also push those same files up to a remote storage service.

IVAN: Yeah, let's talk about that. So, we have an open source product called Tractorbeam. Tractorbeam is kind of the culmination of a lot of work that we did trying to standardize our backup systems for Drupal 6, Drupal 7 and Drupal 8. And Tractorbeam is actually the thing that pushes those backups that are run on schedule up to Amazon’s S3. How does that work?

TESS: So, that's also surprisingly banal, as well, because a lot of the AWS command line environment includes a command which just syncs a directory. So, what we do is, we build a directory structure for our backups, and then once we actually have finished making the local backup, we just use that command to sync that up to Amazon and that's it. That's all it does.

IVAN: Let's talk about that backup directory structure, because there's some method to the madness, right? We're making a backup for every build that we push and every release that we push, but even though we're using the same infrastructure to make the backups for the regularly scheduled ones on the live site, we're not keeping every single backup, forever, in perpetuity, right? There's some sort of logic there.

TESS: Correct. So, the Drupal backup Ansible role, actually has a parameter in it which tells you how long to keep the backups within the target directory. So, if I tell the role keep every backup in the database backups folder for seven days, at the beginning of the build, it's going to check to see if there are any backups which are older than seven days and if there are it deletes them. Now, when we get to Tractorbeam, we're actually running this role multiple times in order to build a directory structure where we have a builds directory, a database backups directory, a site backups directory and a sync backups directory. Inside of those directories, with the exception of the builds directory, are going to be daily, weekly and monthly subdirectories, and we run those cron jobs on a daily, weekly and monthly basis, and they save to those appropriate folders.

IVAN: That's pretty slick. So then, we have all of these backups that happen on the live server, we have them there, and then we have them duplicated in Amazon's S3. So, if something happened to our hosting in theory, we could go back to the latest snapshot of the database, and the files, and the code repo, and the synced folder and we could stand that same site up within an hour or so, I would guess, of having access to those files.

TESS: Mm hmm. So, there's a few things that we didn't talk about. We didn't talk about the file uploads.

IVAN: We didn't did we? So those file uploads continue to live on the live server. Where else do they live?

TESS: So, file uploads are a little bit tricky to actually back up, because they can be pretty large. Like, from my site, it might be 400 or 500 megabytes, that's a pretty big tar.gz file but it's fairly easy to construct that. But then you have to think, if I'm backing that up 7 days a week, and then 4 times more a month, and then 12 times more a year times 500, that's kind of a lot of space. And that's where it gets a little bit tricky to deal with file uploads. So, we kind of made a compromise with the file uploads directory. Most Drupal sites have a behavior that the file uploads directory is additive, which means every time you get new data in it, old data doesn't necessarily go away. A lot of files aren't deleted very often. So, instead we get new stuff added to it continually. This is a bit different for government sites and school sites, but in a lot of sites you just keep adding more stuff. So, why are we going to take individual database snapshots of the entire directory. What we really want is the current state of the entire directory. And it's easier to take that current state. So, what we do is, we just use the Amazon command again and sync the entire file uploads directory to S3, and that acts as the backup.

IVAN: And, so, now we have it on the live site and on S3, but we also have a copy of all of the files that are in S3 over in the Google Cloud as well.

TESS: Mm hmm. We have another integration that works outside of our infrastructure that synchronizes those between those two different service providers.

IVAN: And so, the theory here is loosely related to the personal backup 3-2-1 rule, right? So, the 3-2-1 rule for personal backups is, you should always have 3 copies of your data, 2 of them should be locally, and 1 of them should be offsite. And the idea here is that the two copies you have locally are on two different devices. So, if you have a hardware failure locally, well, hopefully the other one is going to be fine. And then the one that's offsite, is in a different physical location. So, if there's a tornado or a natural disaster or something happens to your house or whatever. So, that's the 3-2-1 rule for personal, but for us we have, essentially the same thing, except it's three copies on two different offsites – so Amazon and Google – and the one that's local is the one that's on the website itself.

TESS: Yea, that’s correct.

IVAN: I like that.

TESS: It does give us multiple different levels to which to fall back to, because backups themselves can actually be corrupt or miss data or something else. There's actually something that a lot of people don't really talk about in our industry, but a lot of data archivists know about which is called bit rot.

IVAN: Yes.

TESS: Because every piece of digital data is physically stored on a physical device, those physical devices themselves can wear out. Now, when it comes to tapes and other magnetic media like spinning disc hard drives, you're going to have to maintain a polarized magnetic field and very tiny little cells on a nanoscale on some kind of substrate. And because those are atoms that have a particular electric charge, that can become depolarized after a while and certain bits might get flipped because some magnetic region on that disc loses its polarity and becomes interpreted from a 1 to a 0. And this can actually happen with a number of different devices. With SSD drives there can be micro-failures in the actual silicon which stores it. It could also be on CD medium, you actually have the metallic aluminum layer that constitutes the data layer actually slowly degrade over time, particularly if exposed to UV radiation. And, don't get me started on cosmic rays, because that's another problem, and that also can cause bit rot.

IVAN: That's why we hope that Google and Amazon is taking careful cognizance of this and is making sure that the bit rot doesn't happen to our precious data on their cloud infrastructure.

TESS: Oh, there’s various different techniques to do that. You can always migrate the data off of different items. You can checksum it so that the data itself has kind of a referential recoverable identifier, so that you can recreate the missing bits, if necessary, RAID hard drive arrays , have been using this for over a decade.

IVAN: Many times, yeah. So, now that we have all this backup, we're safe, right? We didn't have to worry about whether or not the data backups work. (laughing) That's a lead in question. I'm sorry. I think what I meant to say was, we should be testing our backups as often as possible, right? That's where the disaster recovery part of this podcast title comes in, right. So we have all this backup that's great. What good does it do you, if you don't know that the backups work, right? So, you have to have some sort of restore process, right?

TESS: Mm hmm. Your backups are only as good as the last time you tried to restore directly from them.

IVAN: Alright. When was the last time we tried to restore the TEN7.com website?

TESS: When’s the last time someone downloaded something from S3?

IVAN: Ok, so. (laughing) Ok, so we should probably try to restore something for fun, right?

TESS: I mean, if you want to, but one thing that's kind of unique about our industry is that we actually are regularly using our own backup mechanisms, in order to recreate the sites locally, because we're actually downloading the site code, we're going to be downloading the database and then we recover from that database locally to stand up the site again.

IVAN: You know, that never occurred to me. We're actually doing the restoring, the recovery all the time and setting up a different environment locally.

TESS: Mm hmm.

IVAN: Ok, but we're not doing it in an unknown place. So, if Linode went bust, and our site was no longer available, we would still be able to go to Google or S3 to get the files. What if we tried setting it up on Digital Ocean?

TESS: We could probably do that.

IVAN: That would be a good test, right?

TESS: Mm hmm. Setting it up on another environment isn't a bad idea. Even another server within the same service provider, as long as it's data isolated from the existing one.

IVAN: What would your recommendation be to listeners who have websites that aren't necessarily being worked on very often, that the disaster recovery and the restore process maybe isn't as frequent as ours is. What would your recommendation be to those people to test the backups that they have in place?

TESS: So, usually what I would recommend is, if you have another laptop or another desktop, get a webstack up and running on there, and then trying only using your backup files, none of the regular development tools that you have at your disposal, try to recreate the site and run it locally. This can even be inside of a virtual machine, inside of the same system. You just have to make sure that the target system is data isolated from any other external input other than the backup.

IVAN: And how often should we do that?

TESS: That depends. Usually what I would recommend for sites which have fairly low activity, it's not a bad idea to do that yearly. For sites which have a lot higher of activity, monthly isn't a terrible idea to test those. And when it comes down to ecommerce sites, you're going to probably want to do that even weekly.

IVAN: How do we do that for a site I have on Squarespace?

TESS: I'm not sure if you can. Squarespace is a proprietary medium, and there might not be a way for you to externalize that data in the first place.

IVAN: That sounds bad.

TESS: Well this is what happens when you start using a proprietary platform. We're seeing this with Tumblr lately, where a lot of people are just leaving it and having to try to either export their site data and reimplement it somewhere else, or just lose it. And a lot of people who don't have the necessary technical acumen or the necessary attachment to their data just lose it.

IVAN: Yeah, so, if it's easy and almost free, you're getting the ease of use and it being up and running very quickly, but you're kinda losing your ownership over your own stuff right? You're kind of locked into this proprietary system.

TESS: Like, I recently, in effect, did this with my own site, because I had deleted my own Gitlab server because it wasn't particularly well implemented, and I didn't have the time to fix it or reimplement it in a better way. And I wanted to move everything over to Gitlab.com, so I didn't even boot up that server, I tried to recover from what I had just on my own laptop. And I managed to do that successfully, with only about three minutes of downtime when I realized I had a different database. That wasn’t a big deal though.

IVAN: So, let's try to summarize what we just talked about. So, the rule of thumb is backup your website, because it's as important as the other devices around that you have. Do so by backing up the code that runs the website, that's probably in a Git repository somewhere, the database that your website talks to, any uploaded files that are generated through the use of your website and any synchronized configuration you might you might generate. By backing it up means making copies of it, put those copies somewhere on the server where the website is hosted, that's probably an okay place to put them, do so on a regular basis, keep the last weeks’ worth of daily snapshots, keep about the last months’ worth of weekly snapshots and keep as many monthly snapshots as you have storage for. And, then, don't rely on just that location, put it in one and if you can, two other offsite locations. Something like Amazon's a good idea. Google's another good idea. So, you have your copies distributed in different infrastructures, in different providers, so that you're kind of spreading the risk. And, when you have those backups, test them. Try to recover from just the files that you’ve backed up, because you might lose your site, you might get hacked. Something might happen. It's going to happen. Try getting your site up and running on your local machine in an isolated environment, or in a different host just using the things that you created with the backups. How's that for summary?

TESS: Sounds pretty good to me.

IVAN: Awesome, and maybe a good way to wrap the Podcast. Thank you, Tess, for spending so much time with me. (laughing)

TESS: That’s a lot of minutes just to talk about backups.

IVAN: But, it's fun. It's so fun. (laughing)

TESS: Always backup your stuff, you never know when Godzilla will eat your database

IVAN: Exactly. (laughing) You’ve been listening to the TEN7 Podcast. Find us online at ten7.com/podcast. And if you have a second, do send us a message. We love hearing from you. Our email address is podcast@ten7.com. Until next time, this is Ivan Stegic. Thank you for listening.

Ivan Stegic

Founder and President
 
Image
Ivan Stegic

Words that describe Ivan: Relentlessly optimistic. Kind. Equally concerned with client and employee happiness. Bowtie lover. Physicist. Ethical. Lighthearted and cheerful. Finds joy in the technical stuff. Inspiring. Loyal. Hires smart, curious and kind employees who want to create more good in the world. His favorite things right now: the TEN7 podcast and becoming the next Björn Borg.

Tess Flynn

DevOps Engineer
 
Image
Tess Flynn

Tess is TEN7’s Swiss Army knife. She’s an ever-present force in Drupal and a frequent speaker at events, where she's known for comic book-style illustrations in her presentations. Her superpower is problem-solving—she’s always finding ways to improve a site’s infrastructure and efficiency, and she has the rare ability to look holistically at a situation through human requirements, not just those of technology and business. She also loves sleuthing out the source of hacks, especially the ugly and ingenious ones. Tess has encyclopedic knowledge of horror/sci-fi ranging from schlocky and campy to highbrow. She loves Star Trek, where the engineers use their skills to help people.