Knowledge Bank

From how to best plan and execute a server migration through to utilising MySQL replication, this section is full of white papers and best practice guidelines, produced by the experts here at ForLinux to help you get the most from your Linux server.

Best Practises: Development, Testing and Go Live for New and Updated Sites

19/07/2011

 Over the past 10 years, we have managed and supported hundreds, if not thousands, of site migrations and new site deployments. The paper presented below aims to share some of our experience, explaining steps and procedures that can help you successfully develop, test and then deploy a web site, while avoiding some of the more common problems often encountered.

Introduction

Over the past 10 years, we have managed and supported hundreds, if not thousands, of site migrations and new site deployments. The paper presented below aims to share some of our experience, explaining steps and procedures that can help you successfully develop, test and then deploy a web site, while avoiding some of the more common problems often encountered.

Development

If you have access to a dedicated development server, this can have many benefits over developing on a live server, but this does also raise some additional considerations.

If the new site is to be successfully migrated after development, the dev server environment must mirror the live environment as closely as possible. While not every aspect needs to be considered, it is best to ensure the same versions of the basic LAMP framework exist on both servers. While the impact of using Apache 2.0 or 2.2 is unlikely to have an effect on the working of your website, the version of MySQL and any scripting languages used can cause issues if much older or newer than those used on the live environment.

In most cases, a website won't just be simple html, but will also include elements coded in PHP, Perl, Python, Ruby, javascript, etc. Developing a new version of a site in PHP 5.3, when the live environment runs 5.2 is likely to cause problems, so it's best to match versions between servers to avoid any unnecessary issues.

Make sure the dev server also has all the relevant modules installed. If developing a PHP site, you can run php -m on both servers to generate a list of installed modules. Any that are missing can then be installed. Also check for any additional libraries, such as PEAR modules. If you are using Perl, check which CPAN modules are installed, if using Ruby check which Gems are installed, and so on, depending on your scripting language of choice.

Remember, if any additional modules need to be installed to make the development version of the site work, these will also need to be installed on the live server before the site is made live.

Developing on a live server removes these issues of environment compatibility, but this approach does introduce some problems of it's own.

Any changes to made to the existing server environment to allow the dev site to function, have the potential to effect live sites on the server - particularly if you update an application or module that is being used by a live site. This often because the newer version will deprecate functions that are currently in use, or make changes to the coding methods. Always check documentation and changelogs before upgrading any application or module, and assess for potential risks to exisiting code.

Another potential risk when developing a new version of an existing site, on a server that hosts the current live version, is accidently editing the live site. This becomes a bigger risk if the dev username is very similar to the live username, resulting in similar looking file paths.

Where possible, make dev usernames different, or make sure they are obviously a dev username - e.g. use exampledev, where the live version username is example. This also applies to databases, as it only takes a moments distraction to accidently drop a live database, instead of the dev version.

While developing a new site, it is common to block access to it. Usually this is done using a htaccess file, adding password protection, and/or restricting IP access to the site until it is ready to go live.

In addition to this, it is also a good idea to use a robots.txt file to stop the dev site being indexed while it is still being developed. A global disallow rule looks like this:

User-agent: *
Disallow: /

Testing

Once a new site has been developed, it needs to be thoroughly tested to ensure everything works as expected.

* Previewing the site:

If the site is brand new, or a dev version of an existing site, the domain's DNS is unlikely to be resolving to the correct IP address, so you need to use one or more different methods to view the site to test it.

If your server has Mod_UserDir enabled, you can preview the site using this test URL format:

http:///~username

This is a useful quick and dirty test, just to make sure the site is displaying, but when carrying out full testing, it is better to edit your local PC's hosts file. Your PC will always attempt to look up DNS requests internally first, and to do this it checks the hosts file.

On Linux machines it is located at: /etc/hosts

On Windows machines (XP/Vista/7) it is located at: c:\windows\system32\drivers\etc\hosts

Open it in a text editor (e.g. Notepad) and, by default, you should see something similar to:

127.0.0.1 localhost

Note: If you're using Windows Vista or 7, you will need to right-click on NotePad and select the option to Run as Administrator, otherwise you will not be able to save any changes.

You can add additional lines below this to point DNS requests for the dev site to the IP address it is being hosted on. For example, if you wanted to check a site called devsite.co.uk, hosted on a dev server on the IP address 10.0.0.1, you would add the following:

10.0.0.1 devsite.co.uk
10.0.0.1

Save the changes and restart your browser, and you should now see the dev site instead of the live site.

On a Windows PC, you might also need to flush your DNS cache. Open a command prompt: Goto Start > Run, and then type 'cmd' and press enter. Then type: ifconfig /flushdns

To check that the site is now resolving to the correct IP address, it's recommend you use Firefox and the ShowIP plugin. Similar plugins are available for other browsers, but this is the favourite of many web designers.

Another popular trick is to add a short phrase, such as 'dev', to the header(s) of the dev site, as a quick visual indication that you are on the correct site.

You may find that previewing sites via edited hosts doesn't always work as expected. Any sites that have internal redirects (common on Wordpress and other blog/CMS systems), may not display correctly. Workarounds are possible in some cases, so it's best to check the documentation and forums for your software.

Any code that uses absolute URLs (i.e. they contain the domain name), rather than a relative path (containing the just file name, or a file path relative to the document root), will still pull data from the live site, as the domain name will resolve to the live IP address. You may be able to workaround this by editing the server's hosts file too.

SSL certificates will work, but will display errors until the DNS is switched over.

* Check your code and scripts:

Once any DNS issues have been resolved, the site should be thoroughly tested to ensure the code and any scripts work as expected.

The first, and most basic point, is always test your site in multiple browsers. Browsers use different layout engines, so can render pages slightly differently. There are 5 major layout engines, which can be tested by checking the site in Firefox, Internet Explorer, Opera, Chrome and Konqueror.

There is some cross-over between browsers, for example: both Chrome and Safari use the WebKit engine, so you only need to test one of them.

If the dev site is being developed on the live server, compatibility with the live environment obviously shouldn't be a problem. And as long as you have accurately mirrored the live environment, with any libraries or modules required by your coding language of choice, your scripts should work when migrated to the live platform.

If there is any reason to suspect there might be problems with a particular script, it can be a good idea to test it on the live environment, prior to the full go-live. Set up a temporary dev subdomain (and database, if necessary) and upload the script(s) to test into this dev area and make sure work as expected.

If you have any mail forms that call a specific MTA (Mail Transport Agent) - such as exim, sendmail, qmail, etc - you should check that both the live and dev platforms are configured to use the same MTA.

For example, if you write a CGI mail form that uses sendmail, e.g.

open ( MAIL, "| /usr/lib/sendmail -t" );

This will not work if the live environment has exim installed instead of sendmail, so you should always check which is installed on both platforms.

It is also a good idea to check your markup (HTML, XHTML, etc) and CSS files using the World Wide Web Consortium (W3C) validation websites:

http://validator.w3.org/
http://jigsaw.w3.org/css-validator/

More useful checks can be run using the Firefox plugin YSlow. This runs a series of tests and grades the tested page A-F, depending upon the performance of key factors - content, CSS, javascript and server - and makes recommendations about improvements that can be made to speed up page loading, and make the site more standards compliant.

* Load Testing:

If you expect the site to experience a high volume of traffic, it is advisable to stress test it using load and benchmarking utility, such as Siege. This simulates multiple concurrent connections to the site, putting it under 'siege'.

It can be downloaded from:

http://www.joedog.org/index/siege-home

The URLs to be tested are listed in the file /usr/local/etc/urls.txt and then siege command run with several options, to control the number of connections and duration of the test, e.g.

siege -t5m -c50

This will run the test for 5 minutes, simulating connections from 50 users.

While this is running, you can monitor the servers performance - CPU and memory usage, size of apache processes, etc - and the responsiveness of the site. If you are running the dev site on a server which also runs live sites, be aware that these tests will impact on the live sites, so it is always best to test on an external server, unless you run the tests outside of your core business hours.

There are many more options available for Siege, which are documented on the official website.

* Automated Testing:

If you need to run more complex testing - for example, repeated testing of multi-step payment portal - it can be time-saving to automate the process.

Waitr (pronounced 'water') is an open source web testing application written in Ruby.

The main site is http://watir.com, but a good place to start is the '5 minute documentation' at:

You need Ruby installed on the server, and some familiarity with Ruby is certainly an advantage, but the the tests themselves are fairly simple to set up.

It is primarily designed to use Internet Explorer or Firefox, although there are drivers for Opera and Safari too.

Another popular web testing application is Selenium, which can be run through your web browser. Primarily designed for Firefox, it is also compatible with all the major browsers, although the IDE only runs on Firefox. It supports the major web languages, such a C#, Java, Perl, PHP, Python, Ruby and others.

As it's installed as a Firefox plugin, it is very easy to set up and doesn't require any serverside configuration, which makes it a very popular option for web testing.

Downloads and full documentation can be found on the web site:

http://seleniumhq.org/

Go-Live Preparation

A major consideration, around which many of your other decisions will be based, is which day to actually Go Live with the new site.

Monitoring your traffic statistics, and daily sales figures, over a period of weeks, can help you pinpoint when is likely to be the quietest day and time to make the site live. This is particularly important if you are replacing a live site with a new version of the site, to try and minimise any impact on the site's business.

If it's a new site, the Go Live date will probably be determined by your project deadlines.

Other considerations may effect the preferred Go Live date and time, such as the availability of key technical staff, or project overruns, but where possible it is best to make any changes to live systems either early morning or later in the evening.

* DNS changes:

If the Go Live requires a change of IP address, the main thing to consider is this the TTL (Time To Live) of the DNS records. The TTL is defined in seconds - usually something like 3600 or 14400 - and sets the caching lifespan for the DNS records. This prevents them from being cached too long, and becoming out-of-date.

If, for example, the TTL is set to 14400 seconds (4 hours), any DNS requests for these records will be cached for 4 hours before expiring. So, if you update the zone file to change the IP address of your site, anyone who has requested the site within the previous 4 hours, usually just by viewing the site, will continue to have the old IP address cached until the TTL period is reached. This can result in access problems for up to 4 hours for these clients, until the old records have expired.

To prevent this, prior to Go Live, reduce the TTL to 60 seconds and then allow sufficient time for the existing cached records to expire (4 hours, in our example). Any changes will then only be cached for a maximum of 60 seconds, so should appear relatively seamless for most clients.

Note: If your browser keeps trying to access the site on the old IP address, after the TTL has been reached, try clearing the browser cache. If this doesn't work, restart it or try a different browser.

* Matching file paths:

If you are developing a replacement version of an existing site, it is important that the file paths match those of the live environment. If your code just uses relative paths, i.e. paths that a relative to the document root, the file paths should be okay. However, if you have used any absolute paths, these will need updating to match the absolute path of the live environment before Go Live.

For example, lets assume we are developing a new version of the site example.com. We don't have access to a dedicated dev server, so it has been designed and built on the live platform.

The live site has the username example, so the full path to the site's document root is:

/home/example/public_html

As we cannot have two sites with the same username, the dev version of the site has the username exampledev, so the full path is:

/home/exampledev/public_html

Relative file paths are relative to the document root, so the username of the account does not effect them, e.g. to reference the directory blog inside the public_html, the relative path would be simply "./blog", and the absolute path (on the live site) would be: "/home/example/public_html/blog".

As you can see, if code using absolute paths from the dev site is moved into the live environment, these paths will be incorrect, refering to exampledev instead of example, so must be updated prior to Go Live.

Ideally, unless specifically required, it is best to use relative paths, to avoid such problems.

* Code freeze:

You should apply a code freeze prior to the Go Live date. Once the site has been tested and determined to be working, you should reach a point where no further changes are made, to allow the site to be made live without further changes being made.

This point is often ignored, as last minute changes are made as a result of 'feature creep'. However, the race to add final changes immediately prior to, or even during, the Go Live, can seriously complicate the procedure and cause functionality problems with the new site once live.

If a last minute change is needed, determine it is a critical necessary for this deployment of the site. If it is, it might be better to postpone the Go Live date, to give you time to make the changes and thoroughly test the site again, to ensure the changes haven't adversely effected the running of the site.

If it isn't a critical change, consider whether it can held back for a update deployment at a later date, once the changes have been properly tested on the dev environment.

Once you decide on a date for the code freeze, make sure everyone working on the site, and anyone involved in the Go Live / migration, is notified.

* Backup live sites / databases:

To allow the existing site to be rapidly rolled back, if serious problems are encountered with the newly deployed version of the site, the live site should be backed up prior to Go Live.

You may already have a backup solution, which you can restore from, but it can speed things up considerably if you make some backups specifically for the Go Live.

Assuming the site is entirely contained with the document root, make a backup of this, so it can be quickly switched back, if needed.

For example, if the document root folder is public_html, assuming you have space on your account, make a local copy of it by running something similar to:

cp -pR public_html/ OLD-public_html/

The will make a copy of public_html/ called OLD-public_html/. The -pR switch preserves the ownerships and permissions, and recursively copies all files and folders within the parent directory.

If you have any other folders within the home directory that are used by the site, they can be backed up in the same way.

If the site is database driven, you should also make a dump of any databases used. Assuming you are using MySQL, you can use the mysqldump command, e.g.

mysqldump example > /root/example.sql

This will dump the database called example into the root directory as an sql file, that can be quickly re-imported, if necessary.

Note: Before making backups, remember to check the size of the folders and databases involved, and check that the location you are backing them up to has enough space free.

GO Live!

* Synchronizing data:

The most useful commands for migrating data during a Go Live are:

1. Rsync is used to copy entire directories over ssh. For example, assume we want to move the public_html of the dev version of a site across to the live server, which used the IP address 10.0.0.2, we would use a command string like this:

rsync -e ssh -avz --delete /home/exampledev/public_html/
:/home/example/public_html/

When logged on the dev server, this will copy the entire local dev public_html across to the live server, into the live public_html, deleting anything that currently exists within that folder - so make sure you've made a backup before running this command with the delete switch.

The above example also assumes that ssh runs on port 22. If it doesn't, you can use the -p switch to define the correct port, e.g.

If ssh on the live server uses port 2222, you would define it using -p like this:

rsync -e "ssh -p2222" -avz ... etc

You can also use the dry-run switch, which will run the rsync and output what it will do, but won't actually make any changes. This is a useful test option if you are not sure you have got your paths, or other options correct.

The start of the command string would look like this with dry-run in place:

rsync -e "ssh -p 2222" -avz --delete --dry-run /home/exampledev... etc

You can find a full list of options and further examples at:

http://linux.die.net/man/1/rsync

2. SCP (Secure Copy) is used to copy single files over ssh. During Go Live, a common use is to copy the sql dump files of the dev site across to the live server, e.g.

scp -P2222 /root/exampledev.sql :/root/example.sql

This will copy the sql file exampledev.sql across to the live server (on 10.0.0.2), into the root directory as the filename example.sql.

Note: The above example assumes ssh runs on port 2222. Notice that while both commands use a -p switch to set the ssh port number, rsync uses a lower-case p, and scp uses undercase. If ssh runs on the default port 22, you don't need to use this switch, as both commands will assume port 22 unless told otherwise.

Further scp options and examples can be found at:

http://linux.die.net/man/1/scp

3. The Linux mv (move) and cp (copy) commands can be used if the dev site is also hosted on the live server. Using cp is the safer option, as it leaves the original files/folders in place, but you must remember to use the p- and -R switches to ensure permissions are preserved and that all files and sub-folders within public_html are copied recursively, e.g.

cp -pR /home/exampledev/public_html/ /home/example/public_html/

Make sure you make a back up of the live public_html, as mentioned in the Go Live Preparation section, before running this command.

In the above example, the ownerships will also need to be corrected, as the files in the new live site are still owned by exampledev, rather than example. Use the chown command to correct this:

chown -R example:example /home/example/public_html

Using the -R switch, this will recursively change the user and group ownerships to the example user.

* Change DNS:

If the Go Live is an updated version of an existing site, the DNS will usually not need to be changed, as it will already be resolving to the correct IP address. In some cases you might be making a new version of site live on different server, or IP address,in which case the DNS will need updating.

If you have access to the nameservers used by the domain, you should have reduced the TTL down to 60 seconds, prior to the Go Live - as discussed in the Go Live Preparation section. Once changes have been made, you should start seeing the new site within 60 seconds of the change.

If you do not have direct access to the nameservers, and need to request changes via a third-party; in order to minimise downtime, you will need to schedule the time of the DNS changes to a specific time with them. This is not an ideal situation, and can result in serious downtime, if DNS changes are required. If the third-party controlling the DNS cannot guarantee exact scheduling of the DNS changes, it would be advisable to move the DNS to a different provider, if possible.

If the front page of your new site looks very similar to the old version, a useful tool to check that the DNS has updated is the Firefox plugin Show IP - which is discussed in Testing section.

Post 'Go Live' Checks

* Dealing with problems:

Any serious problems will usually be immediately apparent: errors on the front page, or the entire site failing to load.

The cause of some common problems are:

1. A missing file - if you have multiple include files, check they were all copied across. This commonly happens if the site uses include files that lie outside of the site's home folder.

2. Incorrect file paths - As mentioned in the Go Live Preparation section, absolute paths can cause a problem if the site's username has changed, result in the file path to the document root changing.

3. Missing modules - If you did not mirror the dev and live environments, as discussed in the Development section, sections of your code might fail to work.

4. Permissions and Ownership - If the permissions of the working dev site were not preserved, when copied across, areas of the site might not have sufficient permission to run properly. If the ownerships of the document root have not been updated, after being moved from a dev area, this can causes the site to fail to load.

* Manual vs Automated:

Some web-hosting control panel systems, such as cPanel, Plesk, etc, have their own automated migration tools. This can greatly speed up and simplify the migration process, but they are probably of much greater use when moving multiple sites from one server to another, rather than for a single-site Go Live, as they do not allow for some of the finetuning that is often necessary when replacing one live site with another.

As you will have seen at various points in this document, changes to file paths, permissions, etc, are often required when rolling out a new site, so it is often much simpler to make all changes manually, and then you can account for every change made, and schedule each change as needed.

Conclusion

At the heart of any successful Go Live is careful planning and scheduling, and clear and regular communication with all parties involved in the process.

On top of this, we hope you will find that this document has give you some additional tools to help you manage and deploy your sites. While no document can cover every eventuality, the above tips should help you avoid some of the more common issues encountered.

Get In Touch...