HE Cooperative approached us with a problem. Their Drupal based virtual learning environment (VLE) was slow, crashed frequently and was costing an ever increasing amount to keep running. Their clients were losing faith in the VLE and their support team was being swamped with support requests.
Their VLE is used in various health sectors for web based learning and delivery of resources. This does mean that, particularly at the start and end of terms, there is a spike of concurrent users. So any solution had to take this into effect and be able to handle increased numbers of users at any given time, without compromising system performance.
Before even considering the hosting platform, we wanted to get a good understanding of the system we were inheriting. Complaints of sluggish behaviour, intermittent performance and reliability were a little worrying. The first problem was a VERY out of date Drupal installation and old PHP version. After an upgrade to Drupal, we looked at the PHP version and moved from 5.2 to 5.6 (we tried 7 but ran into a few compatibility issues). The newer PHP versions tend to run a little faster, so this was an added benefit. But in addition to this we could also utilise a bytecode cache (opcache) to speed up this huge Drupal site.
Once the Drupal and PHP upgrade was complete, the site seemed a little better, but we still had the problem of intermittent speed. Looking through logs and modules, we noticed the Drupal module backup and migrate (backup_migrate) was installed and scheduled to run on the live server. Not great. Considering this site boasts a huge database and thousands of self hosted videos, a regular backup using this module just wasn't feasible. Looking at cron, the backups were taking so long, we had overlaps and never completing backups - we found our intermittent speed and crashing problem.
"Our support requests dropped immediately, within a week we celebrated our first day with no new support tickets."- Joanna Tate (HE Cooperative)
We use Amazon Web Services (AWS) exclusively for all our sites / apps. In short, this means our database is backed up automatically, in fact we have daily backups and a two week point-in-time recovery. So we can rollback to ANY time (down to the second) from the previous two weeks. In short, it's a far more powerful backup solution and it doesn't slow down the site. Next we needed to make sure all the files were backed up, again, simple with AWS. EBS snapshots create regular copies of the disc and all the files on there. The whole codebase is deployed from GIT, which provides version control (in essence, a backup of all and every change). So our infrastructure provides all the backups we need meaning we can get rid of backup migrate. Almost immediately, we saw an improvement, no more slow downs, no more crashing.
Feeling quite happy with our results so far, we now needed to put this on AWS. We needed a decent base performance all year round, but the ability to boost performance for start and end of term. We decided to use an autoscaling group, this allows us to run the site on a single server when demand is low. As soon as demand increases more webservers are automatically created and added to our autoscaling group. Traffic is then balanced across the servers in the autoscaling group. As soon as the traffic subsides, the additional webservers are terminated.
Autoscaling Drupal does introduce a few other problems we needed to resolve. We used AWS CodeDeploy to automatically deploy the latest version of the VLE to webservers as they're starting up. We also switched from EBS to EFS to enable the webservers to share the same file directory. Sessions are handled at the database layer, so they can be shared between the servers.
A much faster, far more reliable Drupal VLE system that provides a smooth service for HE Cooperative and all their clients. A system that autoscales to cope with high demand. Happier clients, happier support team. And the cherry on top? Our solution saved HE Cooperative money.