Platform Architect, TEN7
Why we don’t just say “yes” to a client that gives us money to support their site
The topics of the TEN7Audit: security, infrastructure, UX and theming, content types
Why we take the time to present our audit findings to you (in three tiers) vs. dumping the PDF in an email
How we take backups for TEN7Care so seriously we created a product (Tractorbeam) to do it for us (and you)
IVAN STEGIC: Hey everyone you're listening to the TEN7 podcast, where we get together every fortnight and sometimes more often, to talk about technology, business and the humans in it. I'm your host Ivan Stegic. We recently published a blog post called ‘Becoming a TEN7 Support Client, You Can’t Just Give Us Cash’, and we think it’s a good description of the process of becoming a client in which we support a site that we didn’t build. I thought it might be useful to talk about this whole process as well in a podcast, and that’s what we’re going to be focusing on in this episode. To help me flesh out the details, I have Tess Flynn, our DevOps Engineer joining me once again. Hey Tess.
IVAN: I wanted to start by talking about TEN7’s business, and how clients come to us, and how we have distinct categories of clients. And then we’ll kind of get into the nitty gritty of the process of onboarding a client, and that’s where I thought you could help us out. Essentially, we have two distinct categories of clients.
One category is a new site build, or a new feature build, and this is a client that specifically wants us to create something from scratch or add on to an existing deployment. And usually that process is discovery and design, some sort of strategy, then development and launch, then we support the site we’ve built. On the other side, the other major category of business we do is supporting an existing site. So, that’s supporting a client or a prospect that comes to us that has a Drupal site that someone’s already built for them. And for whatever reason, they need it supported and maintained and looked after for an extended period of time. The process there is similar. We audit the site, we improve it and then we support it, and the products we have for those services are called TEN7Audit, TEN7Improve and TEN7Care. Now, there’s a natural progression, I think, from learning everything we can about you and what you have, to recommending things we could improve, to then supporting and maintaining that site for you. But the reason we came up with this process was, a particularly difficult client that we had some years ago, And I did something as an owner, that I shouldn’t have done, and that was, say yes to supporting a client whose site I knew absolutely nothing about, whose site no one on our team evaluated or saw, that turned out not to be a brochure site, but a complex, inner connection of many different modules, some custom, some not. We ended up with a commitment with a client for two years that went on for two years longer than it should have, negatively affecting our team, the morale on our team, resulted in some internal soul searching within the organization. We discovered the DiSC analysis process, we applied it to everyone in the company, we focused on our own mission, our own values. And when we came out on the other side, I basically didn’t want this to ever happen again, right? From our perspective, we wanted to do our best work for the client, and we were being handcuffed, because we didn’t do our due diligence right in the beginning of the engagement. And, of course from the clients' perspective, they expected nothing less, right? They wanted us to do the best work for them, we possibly could be doing, and so we came up with this three-step process that sets us both up for success. Essentially, we do an evaluation and an audit of an existing Drupal site. We call that the TEN7Audit. Once we’ve evaluated it, we kind of have to get to know the code base a little better, and nothing’s ever perfect, there’s always some amount of work that needs to be done. And so we do that. The next step we call the TEN7Improve step. And once that’s complete, we offer TEN7Care, our support agreement that keeps the client's Drupal site humming, kind of the way it should be. So, it’s three steps, TEN7Audit, TEN7Improve, TEN7Care, and what I’d like to do now, is talk about the details of TEN7Audit. So, a prospect comes to us, and they ask us if we want to support their site, and we say, “You seem nice, I think we’d like to work with you. Let’s take a look under your hood.” And, that’s what the TEN7Audit really is. Tess, you’re the principal engineer that’s responsible for the TEN7Audit. So, when a prospect comes to us and I say, “Okay, let's do the audit,” it lands in your lap, right? And, you’re then tasked with doing this audit. And we’ve done so many of them we have a good process around what we do, and I want to find out about what exactly happens during the audit? What’s the first thing you do?
TESS: The first thing I usually do when it comes to doing an audit is that I’m going to need access to the underlying infrastructure. So, I’m going to need access to SSH in order to get to the file system on which the site is hosted. And I will also need database access credentials, so that I can get a copy of the website itself. Now, I like doing some audits entirely in place if it’s possible, without copying it to a local environment, but increasingly this is difficult, because depending on the client and depending on the site that might not be possible. For some shared hosting providers, for example, adding an additional tech in place can be a risky proposition, because it can cause some issues with the site, in order to perform the audit. Also, certain platform-as-a-service hosts actually don’t really let you do that very easily, in which case you do have to copy the site off and then run it locally. And then, in some cases, you might have the ability to run additional tools in place, but you’ll have to run it locally anyways, because the actual environment that the site is running on already has something already so fundamentally wrong with it that you can’t even run those tools in the first place. So, it all comes down to: how do I get my hands on the site code and the site database? That is always the first problem when it comes to doing an audit. And, the thing with doing that is, some clients actually find that kind of intimidating, because they don’t know who you are, they don’t know if they can trust you, and they’re not sure if they really want to give that information over. And, it’s really necessary in order to perform an effective audit in the first place. Otherwise, the best I can do is look around a little bit. Now, if I do get that access, then I can start tearing down how the site itself is built, and this becomes an interesting process. The very first thing to do is to run several auditing tools. Depending on the site, that could be Healthcheck, or the Site Audit module, or the Hacked! module. There’s a number of different tools that we use in order to facilitate gathering the best information necessary, to see what the red flags are in the site, if there are any. What’s interesting about this process is, already the attempt to run these tools will tell us something about the site itself. Can we run them in place? If we can’t run them in place, why can’t we run them in place? What’s the underlying problem that’s preventing us from doing that? Is it because of the hosting provider? Is it because of the way that the server infrastructure is set up? Is something else wrong that we need to take note of, that is something that we need to remark on our document? Once we actually run the tools, we examine the output of those tools, and usually this gives us kind of a 1,000-foot perspective of the general health of the site, but it doesn’t allow us to really uncover the underlying causes of some of this. An auditing tool can tell you, say, you’re running out of disc space. But why are you running out of disc space? You might have, well, there’s excessive activity in the database, or there’s excessive CPU draw, well why? What’s really doing that? So, all these tools just paint a more detailed picture, and at some point, you do have to start breaking those down and going after them and investigating them yourself.
IVAN: So, I was going to ask you about the tools that you use, and you mentioned that you usually SSH into the clients website. I would imagine that if there’s already some continuous integration in place, or some codebase in place, maybe you’re using Git to get those files as well. And then you mentioned Site Audit module and the Healthcheck module and Hacked!. And it’s not just modules that you use to determine the health of a website too, right? You are also evaluating the infrastructure, so, is it a shared host? Is there Varnish installed? Is there Memcached? How is the host configured? What kind of access do you have? Those are also other things that we evaluate. Could you talk about the next step? Once you’ve done, kind of the tools evaluation, what are the other things you evaluate?
TESS: It’s kind of like an episode of Car Talk really. A caller calls in, says, “Oh, hey, my site is making this weird sound,” and then after, you should not turn it like that, so it doesn’t make that sound. Then you actually ask, “Okay, when does it make the sound?” “How long has it made the sound?” And you start following the investigative chain. Using the tools really are just the first step in that process, because often with websites, as with a lot of modern vehicles, they are so complicated we don’t really know what’s wrong with them intuitively. We will only be able to know after analysis, and usually that requires an additional amount of technical expertise. So, the tooling basically gives us a rough topology of what the site health is like. And, afterwards, we need to investigate each one of those vectors. And a lot of the time, it comes down to how our audit document is structured, which allows me to investigate that. So, what I tend to do is, I will first start by running the tools and see if there’s any red flags in there. If there aren’t, then the next thing I do is see if the client has mentioned anything in particular that we want to look at when it comes to the site. Sometimes that can give us a good clue and sometimes that could be a false positive, and sometimes we don’t have that information. So, it depends on what we have available to track down the necessary clues. With our audit document, we actually break that out to several different sections. There’s security findings, infrastructure findings, UX and theming findings, and content findings. Each one of these is a section by which we can go and do further investigation on how the site health is working. Usually the next thing after I do the first pass is to check when was the last time the site was updated. This sounds like such a simple, easy thing, but it tells you a lot about, not necessarily the technology, but the people around the site, and how they worked with the site and regarded it. Everything we do in technology is about people, so you have to understand the underlying human story around the technology, and that will allow you to effectively resolve any problems that come up with the technology. So, the first thing that I usually do is, see when it was last updated, and that, ominously, I use the status update page just to check it and see what the security updates are. If there’s a lot of them and if there’s a numerous amount of them, and if there’s several years in the past, since the last update, that tells me a lot about the human management around the site, that it might not necessarily have enough technical people around it, or people don’t know that they have to update it, or a number of different human problems related to that. Then, once I have that information, then it’s down to going through each individual section. So, I note all the different modules that require an update, which ones need security updates. Sometimes sites will specifically hold back one module or another, several versions, and that doesn’t necessarily speak to neglect, but it might be an intentional holdback, because of some bit of custom functionality built around that module that could not be reimplemented easily with the available skill levels that they have within the organization after doing an update. So, that also tells me something. Then it comes down to, okay, let’s look at the infrastructure of the site. Are they on a platform as a service provider like Acquia or Pantheon or Platform.sh? Are they on shared hosting? Are they on a virtual private server liker Linode or DigitalOcean? Are they on self-managed hosting? Because some organizations mandate self-managed hostings, particularly governments and schools will have a mandate for self-hosting by default. And each of one of those tells me something. If it’s on shared hosting, that already tells me about the kind of price tier that they’re looking at, how they regard the amount of performance of their site. Do we need to investigate if they have outgrown that. If they’re on a virtual private server. When was the last time the server infrastructure was updated? What distribution of Linux or Unix are they working under? Do we have access to underlying abilities like accessing root so we can perform even more invasive checks, like disc sizes? What software has been installed? What are the user permissions that are used? Who else is using the server? If it’s on a platform-as-a-service provider, that gets a little bit different. Usually those I tend not to audit for infrastructure too deeply, mostly because they tend to work out pretty well by themselves. They’re intended to actually be fairly ‘use it and forget it’. So, a quick cursory check is important for those, but unless if something specifically stands out to me, I usually don’t investigate them very deeply. So, we’ve covered security, we’ve covered infrastructure, then I start looking at content. What kind of content types do we have? Are we using content types? That sounds like a ridiculous question to some people, but yes, some sites decide, “I don’t know about this Drupal thing. I’m just going to use our raw table and some code, and slap it in there, that’s good enough for me.”
IVAN: We’ve seen it.
TESS: We’ve seen it, and that comes with pluses and minuses, and it’s important that we bring those forth to the client. That is something else. What’s important through all of these little details that we’ve covered is that, it’s not just noting a thing exists, it’s going why does that exist? Why has that happened? Find the underlying story behind the motivation that lead to this current situation. Everything is really about documenting each one of these finer details, and the interesting thing is that usually as you document these details, you start asking better questions yourself, and then you need to go investigate those questions. So, with content, you might ask, what kind of content types do you use? Do those make sense with the kind of site that they have? Do you have a number of duplicate content types, like news and blog and press release? Are they the same kind of content really just in different categories? Do you have a large number of fields that are unused? Do you have too few fields that you’re making do too many things? Do you shove entire bits of layout into your content? Trust me, we’ve all done it, it’s okay [laughing], but we need to do better than that. There’s a lot of these little bits of story that come too. Once we’ve investigated the content types and those structures, usually I try seeing what kind of custom integrations that they have, as well. Do we interact with any third-party APIs or commerce organizations or survey organizations? Do we have any dependencies that can be a bit of a risk for us in order to manage going forward? Because if it’s outside of the realm of Drupal, those can be a little brittle and we do need to actually be careful about how those are implemented. Eventually we do come down to custom functionality. You notice we’ve done all of this other stuff, and now, twenty minutes into the podcast, are we talking about custom functionality. Because custom functionality in general with a lot of Drupal sites that we’ve audited, tends to be a lot less than you expect. Usually a well-managed site has only a minor amount of custom code, just enough to pull the site together. Some sites on the other hand have an enumerate amount of custom code, and that also tells us a story. How much custom code do you have? Do you need that amount of custom code for the site that you’re running? Why did that custom code get used? You have to examine each one of these decisions in order to see what the whole picture of the story is.
IVAN: It’s a lengthy, involved process that we undertake, isn’t it? And, I want to make sure that we are clear about what isn’t in the audit. So, the audit is mostly a health check of your site, your infrastructure, and your processes. We do a cursory look at your analytics and a cursory look at your content, and a cursory look at your accessibility. But as far as doing a deep dive into a content audit, or a deep dive into an accessibility audit, which we have done and which we do, that is not part of the deal here. The main point is to get to a point where we can give you a report, and a status quo, and a set of recommendations about the things that we think you need to fix. Now, let’s just talk about the audit itself. What do you actually get? You get a PDF and for those of you listening, you can go to ten7.com/audit to see an example of a PDF of one audit that we’ve done. It’s been anonymized so there’s no actual client information in there, but you’ll get the gist from the PDF itself. From when we kick off to when we've created the PDF it usually takes about four weeks and at the end of those four weeks, we have a document, a PDF that we then present to the client. We don’t email the client with this PDF and say, “Hey, take a look at this thing. Tell us what you think.” And we definitely don’t send the email with the PDF in it to the client before we present it to them. That video conference, that presentation of the TEN7Audit is very important. It’s very important to provide that to our clients in real time. Tess, can you talk about that meeting and what that meeting feels like and look like. What’s the goal of that meeting?
TESS: So, first let’s frame what the document looks like. On average these audit documents run 18-35 pages. That's right, pages. I’m a bit wordy. [laughing]
IVAN: That’s right. It’s a big one. [laughing] Right, I mean, this is a serious audit, right? It’s not going to be a couple pages long.
TESS: And, the problem with a document that size that is that comprehensive is that it’s really easy to get drowned in it. There’s just so much detail. There’s no framing around it. There’s no discussion around it. There’s no opportunity to ask questions, and suddenly you easily forget points and questions that you had three pages ago, because you have new ones that have already filled up your entire internal question queue. So as a result, it’s really important to have this meeting at the same time that we hand over the document, because it allows us to make this a conversation, not just, "Here’s the results." Because no one wants "Here’s the results," we really do want to have a conversation about it. So, the way that it works is, generally we start with the document itself and we briefly talk about the methodology involved. And because sites are all unique, sometimes we do have to adapt our methodology dependently. We’ll point out if we have to run the audit on a local copy for various reasons, and then we start talking about the actual audit findings. And the way that the audit findings are structured are also important, because at the very front we have critical findings. These are the most important things that would need to be fixed with the site immediately. These are things that are going to be possible security attack vectors, critical updates that have yet to be applied, or other critical infrastructure things that need to be resolved as soon as possible. All of these things need to be acted on relatively quickly to prevent downtime or possible data destruction. Those are usually the first things that we talk about, and they’re the big, big items. And, the idea and the intention behind this is so that we can stress the things that are the most important to fix right now, before we get to other underlying things that might require a longer-term effort. Basically, we want to make sure that we dampen down the campfire, so it doesn’t start a wildfire.
TESS: And once we’ve done that, then we go through every different section that is in the audit document and this can be a long meeting. Usually these meetings take about an hour, and we outline each individual point. We don’t read the document because everyone can just read the document, but we point out the things that are the most important that I found with that and give additional context. If there are questions, we can answer them at that point. That way no one feels that their questions go unanswered or that they forgot them, they can always have them right there and we can answer them right there. We go through each individual section and sometimes we will have a finding that is, I don’t know why it was built like this. There’s probably a good reason for this, but I don’t know what it is, and usually at that point I might ask you, the client, why was it built like this? Because sometimes there is no right answer for some of these things. Sometimes we find, “Oh, well we used a custom table here because we actually have another integration with a GIS application and somewhere else that requires database access.” “Oh, that makes perfect sense.” “Sure, okay. That’s understandable.” Now I don’t need to worry about that particular issue. Now I know that I don’t need to make a recommendation to fix that underlying issue and make it more Drupal-like, because it was intentionally done that way. So, because this is a two-way process, this is really, really important. Once we get all the way through the different categories, and usually by the time we get to the end of it, we’re talking content and theme and UX and then a brief touch on the analytics findings. Then we talk about recommendations, and our recommendations usually come in three distinct tiers. The first tier of recommendations are usually things that we want to do right now, in order to make sure that we don’t have a wildfire. Things that fix immediate, most critical issues with the site, applying secure updates, fixing any potential security attack vectors, DDOS possibilities, fixing other underlying configuration problems like, caching was disabled for some reason, or maybe we should look at turning Varnish on, or maybe the setting was incorrect, or why do you have user registration open when you’re a brochure site? [laughing] Things that are really simple and really actionable that can be done generally within a week after giving the audit document over.
Then the next tier of recommendations are things that we want to try to do to maximize the site as it currently exists, without fundamentally changing the functionality of the site. So, that’s going to be things like, “Well, do you think that you can enable this kind of cache configuration with Varnish or Memcache? Maybe you can change the way this functionality works so that this bit of functionality will work better for you going forward. Maybe your theme is a little wonky here and needs some correction.” Sometimes we might make a recommendation to change hosting providers at that point as well, because if you’re on shared hosting you might have outgrown that. If you’re on Acquia or Pantheon, you might need to change your hosting plan. If you’re on a VPS (virtual private you might also need to change your pricing plan to get more vCPUs or more disk space, or more network transfer storage, those kinds of things.
IVAN: Or caching even.
TESS: Or caching. The third tier is going to be things that allow the site to reach its full potential, which may involve fundamentally changing certain aspects of how the site functions. So, we might want to say, “Maybe you should make a new theme. Maybe you should take this bit of functionality that was implemented this way and reimplement it this way instead.” Those tend to be bigger projects that require several weeks to months to implement, depending on the kind of site. And some of those might not be something that you want to work on immediately. Some of those might be, “Yeah, we were thinking about redoing the entire site in Drupal 8 and we’re on Drupal 7.” That’s one of those recommendations, and doing a site rebuild does take time and that’s where that recommendation goes. These three tiers allow you to prioritize which aspects of the site you want to act on as a client without feeling like, “Oh, geez, my site is terrible, and everything is wrong and on fire.” [laughing] No, we break that up for you so that you can know, “Okay, these are the things that we need to fix now, because you don’t want your wheel to fall off while you’re on the highway. Here’s the things that we should probably fix because that’s not good, winter's going to happen eventually, and you need to replace that heater core in your car, because you’re going to get cold eventually (laughing), and then finally maybe you just need a new car." [laughing] Everything comes under car analogies.
IVAN: Yes, it does. Or tractors, right? So, that was a great summary of that list of recommendations and the three tiers, Tier 1, Tier 2, Tier 3. And you’re essentially cherry picking those recommendations that make sense for your needs, for your budget, for your organization moving forward. And what we do as the next step in our process, so this is step one right TEN7Audit, it’s about four weeks, get out of that with an audit report and a list of recommendations. And once we’ve done that you cherry pick the list of recommendations and those become tasks for us. And with that list of tasks, of things that you want changed, things that you want improved, based on your budget and our recommendations, we package that up into the TEN7Improve contract and the next step of the process. And usually that takes between four to eight weeks of our time, and it’s really dependent on kind of the results of the audit.
TESS: The thing that’s also important in this entire thing that often goes unsaid, is that an audit is a wonderful "get-to-know-you" activity. Because now after the audit, we as an organization as TEN7, know your site and have a lot of knowledge about how your site works, and what your motivations are, and what your perspective of your site is. And also, you know us, and you know our processes, and you know our names and our faces, so that you can actually know who to talk to. An audit is a wonderful get-to-know-you exercise, and I cannot stress the importance of that human connection enough in what is otherwise a very dry technical field.
IVAN: The importance of that human process is not just getting to know each other, but to laying the groundwork and the foundation for a long-term relationship after that audit's happened. And I think the next step, the TEN7Improve step, that’s kind of getting to know your code base, getting to know how it’s configured more deeply, not just one person getting a higher-level view of the site, but more than one person getting a deeper level understanding of the technical debt that’s in the site, the way that things are configured exactly, so that there’s not just one person who knows how your site is configured and deployed, and I think the TEN7Improve process is also a good next step for the relationship, because now we’re spending more time with each other, getting to know how each others' work styles are, what your needs are, what our needs are, so I guess you could say the TEN7Audit is kind of the dating part of the relationship, and the improve step is kind of the engagement part of the relationship? I guess it’s the time when you get to know the deep-down, dirty secrets of the code base [laughing].
TESS: Why does the site always leave the socks on the floor? [laughing]
IVAN: [laughing] Exactly. That’s exactly what the Improve process is, the answer to the socks on the floor question, right? So, four weeks for the Audit, four to eight weeks for the Improve process, the outcome of the Improve process is a site that we now know quite well. We know how much technical debt there is, we know how it’s configured, we’ve improved it, we’ve updated it, it’s in a state now that we would be comfortable saying, "We can support this for you from now on." Don’t you think?
TESS: That’s the entire goal of the Improve process, is to get us to a point where we can start working with it regularly without having to worry that the site's going to completely blow up for whatever reason, be it infrastructure or code or simply lack of knowledge.
IVAN: And so now we know the site, right? So we can offer a support agreement and that’s the last step of the process, TEN7Care. The way that the support agreement is structured is, it’s an annual agreement. We agree to some minimum number of hours that we will have every month with you, and the agreement typically covers things like Drupal site maintenance, so we maintain and update the core and contributed modules that are installed. We provide 24/7 uptime monitoring and response, and so that part is really dependent on the hosting provider that you have. So in some cases Pantheon is already monitoring their sites, we’re monitoring in addition to that, and sometimes we don’t have any control about whether Pantheon is up or down, and so we have to revert to their knowledge and them working on an emergency, and we are simply the conduit for you. The other thing that TEN7Care provides is regular backups and archiving and that’s really important isn’t it Tess?
TESS: I can’t stress that enough—how important a backup is, because life is unpredictable, and you want to make sure that you have a backup just in case life throws something very, very nasty in your direction.
IVAN: And we’ve got a number of blog posts and podcasts that we’ve done where we've talked about backups and details of what you should have. We use Tractorbeam, the open-source solution that we’ve published and provided to the community to do those backups of your website. Remind me again, Tess, it’s daily, weekly, monthly, right? And, two different off-site locations.
TESS: Mm-hmm. Correct.
IVAN: Great. So, AWS (Amazon Web Services) and Google Cloud, DigitalOcean, those are the three different companies, three different places. So that’s covered under your TEN7Care support agreement. And then all of the CI, continuous integration, and automation goodness that Tessa absolutely loves and that I am a huge proponent of, that comes as part of TEN7Care as well, right? So, our regular release process, the use of feature branches, the use of code review, the fact that we can push code and it deploys to numerous different environments, automatically. Do you want to say a couple things about that? I don’t want to prevent you from geeking out here [laughing], so do say something about that.
TESS: Mostly the reason why CI is particularly wonderful is because the problem is that human beings are inconsistent. You’ve had a bad night of sleep or you’ve read something upsetting and you might be distracted, and that can cause real downtime and real outages and real technical problems. The idea behind using CI is so that you remove more human hands from the process, and outsource that to a piece of technology that can do that consistently every time and be a lot more situationally aware of what’s going on when you do the deploy. So, having CI allows us to respond to changes a lot more quickly, a lot faster, and make sure there’s accountability at every step of the process with regards to updates and feature deployment.
IVAN: I couldn’t have said it better myself. Just wonderful, thank you. So, TEN7Care is the last step in the process, preceded by TEN7Improve and, of course, the TEN7Audit right at the beginning. I think we’ve kind of gone through the whole process, right, beginning to end.
TESS: That’s the whole thing.
IVAN: That’s the whole thing. Thanks again for being on the podcast. It’s always such a pleasure to talk to you, Tess.
TESS: Not a problem.
IVAN: So, if this podcast sounded interesting to you, and you think we might be able to help your organization in some way, we’d love to hear from you. Send us an email, send it to [email protected] to start a conversation. You can also find out more on our website at ten7.com/welcome. That’ll take you to our blog post on the whole process, and you’ll see a link to the example of the audit and an example of the support agreement as well. You’ve been listening to the TEN7 Podcast. Find us online at ten7.com/podcast. And if you have a second, do send us a message, we love hearing from you. Our email address is [email protected]. Until next time, this is Ivan Stegic. Thank you for listening.