![]()
Recommendation for Dynamic
Web Content Hosting Environment at UNT
By Shannon Eric Peevey, UNT Central Web SupportAs completely static Web content begins to slowly disappear from the World Wide Web, more and more Websites are relying on various dynamic content engines to generate the pages that we see. (For example, the University of North Texas uses Macromedia ColdFusion, PHP, and Zope to generate many of its wonderful pages). This means that our Web hosting environment is no longer going to be made up of a simple Web server with a location for content manipulation, which in turn means, that our environments are becoming much more complex. We now have five or six points of failure, instead of one. Obviously, as a setup becomes more complex, we have all the more reasons for implementing some sort of mechanism for us to make sure that when one of the components of our Web hosting environment fails, our end-users experience is not interrupted. In an attempt to implement these mechanisms, it has been my privilege to explore many of the options that are available to us. During the rest of this article, I will be discussing our current component technologies that are in production at UNT, the mechanisms that are available to help bring up the quality of service for end-users, and then evaluate and recommend these mechanisms, based upon testing and reviews of these mechanisms.BackgroundIt has been my pleasure as a system administrator to be able to work with many of our great Web developers during my time at UNT. Some of them full-time staff, like myself, but also a plethora of student workers, graduate assistants, and outside contractors. This variety has been a pleasure, as every one of these people brings something unique to the table, with regard to technology, technique, and design. Some are geared more towards design, some more towards technology, but all are excited about doing something unique to place their stamp on the universities Web presence. In response to these diverse needs, it has been necessary to offer a variety of technologies to help fit a greater majority of the needs of our developers. Therefore, we currently offer the following technologies:
We have seen much user satisfaction with the technologies that we offer, and have had a growing developer base, as developers begin to integrate dynamic content into their static sites. This has been an exciting time for us, as the dynamic content server has grown from 100,000 hits a month when I started, to over 2,000,000 hits a month at the present time. But, this has also placed quite a bit of load on each of the component technologies, as we have moved from a dual 300 MHz machine with 256 MB of RAM, to a dual 1 GHz machine with 4 GB of RAM, to five machines running one or two component tools apiece. (All in a little over two years!) With this complexity, has come failure. Most of the time, these failures can be pointed to either the Macromedia ColdFusion Server, or the Microsoft SQL Servers, but nevertheless, the downtime can be catastrophic to our quality of service. Therefore, it has been decided that it is time to take the next step and put into place some sort of mechanism. It has been my groups job to explore these mechanisms. High AvailabilityHigh Availability is the infrastructure design to minimize downtime for a system. In other words, it is preventative design for an online system, in our case, that allows a component failure to automatically fall back on another machine which is running the same exact services. In simple terms, it is just duplication. We have multiple machines running the same exact copies of a component technology, with the same exact configuration, so that it can be brought online in the event of a failure. Our team understood this. What we didn't understand, was the correct technology for this type of high availability. We had heard of clustering, and load-balancing, and every other buzz word that has been spoken on this topic, but we didn't understand what they were. (In short, clustering is the sharing of processing power and/or load over multiple machines, which act as one. (This is usually done internally on the machines by software). Load-balancing is the sharing of load over multiple machines, which are identical, by an outside tool, such as a load-balancing router, or proxy server.) Doesn't seem like much difference, does there? In the next section, we will begin to explore the complexity of our undertaking, and how these tools are going to be used in conjunction with our component tools to bring high availability to our dynamic hosting environment. ComplexityOur findings led us to the conclusion that our environment was too complex for a single solution of only clustering, or only load-balancing, for high availability. First of all, we have to look at the technology of the component tools, and how they retrieve data, etc. from their source. A Web server is a daemon, or service, that sits and listens for incoming calls from clients on port 80 and 443. It then reacts to this call in a variety of ways, for our purposes here, we are only interested in requests for HTML pages. The server has been configured to know the location of these files, so it makes an input/output call for the file on the storage device, and then it returns those pages to the client. These calls are fairly simple, (and relatively slow), so it is possible to place all of these files on a central server, and mount the location on the Web server machine. (This makes the file location appear to be local to the Web server, though it is acquired through remote calls across a distributed file system.) Thousands, even tens of thousands, of Web servers can all mount this remote directory, and therefore, it is possible for them to access the contents of the file concurrently, (depending on how the distributed file system deals with file locking). In this case, you would place a tool, often a router, in front of the machines, and the router would send requests to the servers, randomly, or through some sort of load-checking algorithm. The key factor here is that the distributed file systems are "slow", or slower than calls to the local file system. Therefore, it is ok to run many cheaper machines, attached to a central storage unit, with a load-balancer in-front of them taking care of the distribution of load. But, in the case of a database server, the server needs to access and manipulate rapidly changing data, which needs to be correct all of the time. (If one is familiar with relational, or even object, databases, then you know that any mistake in execution of a command, and your data and database will become corrupted, and basically unusable). Also, the ratio of the database transactions to Web page calls is weighted quite heavily towards the database server. (For example, one call to a popular Web page on one of our machines makes over 40 calls to the database server. This page is called about 200,000 times a month, for around 8,000,000 calls to the database per month, for one page). Therefore, this "slow" distributed file system is going to become a bottleneck for our hosting environment. There might also be problems with calls being out of synch with each other, because of the distributed file system, again creating a potential situation for data corruption. How do we create a fail-over environment for databases then? In most cases, it seems that clustering is used most often in this situation. The clustering software is usually integrated into the database server itself, and is given the control of the transactions as they are sent to the data repository. It also controls the redundancy of the data, because most of the data is duplicated across multiple machines, and even controls how that data is updated on each machine, usually through a master/slave situation). High Availability implementationPut simply, we need hardware and/or software that will automate fail-over and load-balancing for us in our dynamic Web hosting environment. As mentioned in the last section, the complexity of our setup forces us to use a mixture of high availability technologies. In this section, we are going to discuss this mixture in detail. As you may have gathered from the technology list, in the background section, and the information that was delivered in the previous section, we will:
Let's break it down in detail:
RecommendationsAt this time, we would like to make the recommendations for creating a highly available Web hosting environment here at the University of North Texas. First, we will list the high availability options that are already made for us. These have already been limited by the component tools themselves, so there is no choice to make. They are:
Finally, the issue of Web server high availability rises. We are basically locked into clustering or load-balancing solutions on every aspect except the Web servers. our group has found that there are essentially two options available for load-balancing our Web servers. We can use the load-balancing routers that are already available from the Unix Services team, or we can create our own highly-available Web infrastructure using Linux-HA.
|