We Give Back: Drupal Module Framework for Migrations
We recently completed a special data integration project for the University of Minnesota, Office of the Executive Vice President and Provost, Faculty and Academic Affairs. Faculty Affairs, as they are referred to, uses a product called Activity Insight from Digital Measures (referred to internally as “Works”) to capture and organize faculty information. Works acts as both a departmental LinkedIn and their internal performance review tool. They needed help getting faculty data out of Works in order to display it in web profiles on departmental websites. Many universities use Digital Measures to manage their faculty information, so when we took on the project, we wanted to be able to open source the solution we came up with and make the code available to those universities who wanted to do similar projects. Luckily, Faculty Affairs was a willing partner in working towards that goal!
The Digital Measures Migration
Once the project was kicked off, our DevOps Engineer Tess Flynn took a peek under the covers. “Digital Measures posed a significant challenge to open sourcing anything,” said Tess, “because it doesn’t really have a set data schema. It’s a REST-based application for storing structured data. It’s not as specific as a database; it’s somewhere between a document store and a database. All Digital Measures lets you do is say: this is a data object that we want to store, and these are the fields it can have. The University creates the structure, and the data can then be accessed through an API.”
For this project, we created one module specific to Digital Measures (Digitalmeasures Migrate) to access the Digital Measures API from within Drupal. This, in turn, allows data extracted from the API to be leveraged through Drupal 8 migrations. Due to the volume of data, we write it to a “staging” database table. This way, we have more flexibility in processing it, without bringing down the Digital Measures API, or encountering rate limitations. Once we get the XML out of the API and store it in the database, then we can do the transform part. That’s where the bulk of the work is done.
The other seven modules (process plugins) solve specific migration problems in the transform step of the Extract-Transform-Load (ETL) process. We wrote a series of Drupal migrations for Faculty Affairs which relied on these plugins.
Here are the modules we created and contributed to Drupal.org:
- Digitalmeasures Migrate
Provides a method to access Digital Measures API through Drupal.
- Migrate Process XML
One of the most-used modules when writing migrations for Faculty Affairs. It reads XML and allows you to extract particular key sections using XPath.
- Migrate Process S3
This is the second-most useful module. It allows you to download objects from an S3 bucket as a file to your Drupal site. For Faculty Affairs, this was used to download a professor’s profile photo, resume/CV, and other attached files. Once downloaded, this is then attached to the professor’s profile so that direct linking to S3 is not required.
- Migrate Process URL
Allows you to manipulate URL values that are provided within the data.
- Migrate Process Regex
While a small module, it’s useful in a surprising number of cases! This module provides a way to use Regular Expressions in a Drupal migration. This allows for matching and text replacements where XPath alone wouldn’t be enough. “This is a tiny thing, but it’s really handy in migrations,” said Tess. “For example, when you have two quotes next to each other that should have been one quote.”
- Migrate Process Vardump
Often used for debugging, this module takes any data given to it and dumps it to the terminal output and then passes it on. “It doesn’t sound very useful, unless you’re writing migrations, but then it’s the best thing EVER!” joked Tess. Every time Tess has written migrations, she said she had to write a custom version of this. So having this as a standard module will help many users. This module currently has the most users on Drupal.org out of all the modules in the framework.
- Migrate Process Skip
When processing lists of data, the migration system has very specific ideas of what’s considered “empty” or not: zero = false, empty string, and NULL, which of these means ”empty”? Drupal has a “skip_on_empty” migration process plugin, but it may not give the desired result if you don’t know what you’re doing. This module provides a few different mechanisms to define what is “empty” and should be skipped.
- Migrate Process Trim
Often, we need to remove leading or trailing characters—such as spaces—after extraction. This module provides a quick and simple means to achieve this in a Drupal migration.
We Open Sourced the Tools, Not the Solution
“Originally I really wanted to open source a solution, but it didn’t end up working that way,” remarked Tess. While multiple colleges and departments within the University of Minnesota (and outside of it) are all using Digital Measures, they all have their own customized implementation of it. Because of this, there will always be custom work required to display the Digital Measures data in each department’s web profiles. So the best you can do is to release tools to get to the solution. The framework we created will give each department’s IT team a head start on the process. “The tools we created are like socket wrenches to work on a car,” said Tess. “Every car is different, but the socket wrenches make the work easier.”
If we’d open sourced the Faculty Affairs solution, it would have been specific to them, and not transferable to another school. But what we can open-source are the tools to create a solution.
“There are mechanisms to retrieve types of data, but the types of data are up to the user of the API to decide,” Tess continued. “Faculty Affairs created the schema and data formats. All we did was used their existing forms, and tweaked it here and there as we needed to, and then wrote tools to create the necessary things to import data from Digital Measures into Drupal. Digital Measures is so customizable, you can’t do anything but release tools.”
This framework has broader applications than just Digital Measures. “Each module solves a specific issue that can be used for Digital Measures.” said Tess, “But the specific problems they solve are common to migration processes in general. With the exception of Digitalmeasures Migrate, these modules are fairly generic and can also be used for a variety of other migrations.”
If you want to use the framework yourself on a Digital Measures migration, the place to start is with the Digitalmeasures Migrate module. “That gets you the data,” said Tess, “but it doesn’t necessarily put it into a form that you can immediately use in Drupal, because unfortunately there’s no way of telling how the data is structured, so it’s up to you to make that part.” Tess did include a rough process for using the framework in the README for the Digitalmeasures Migrate module.
Got a Data Migration Problem? We Can Solve It!
Are you using Digital Measures and wishing you could pull the data into your Drupal deployment? Do you have any other Drupal problem we can help you solve? Drop us a line!