|
|
|

By Austin
Laird, Distance Learning Administrator, Central Web
Support
Online learning here at the University of North
Texas has grown by leaps and bounds over the past several years. Our
primary and centralized means of delivering course content online is
via WebCT. This semester (Fall 2003) we have over 955 UNT course
sections enrolled in over 473 WebCT courses. Our total unique student
enrollment is over 18,300 for a total number of seats in WebCT courses
just under 30,000. What is more exciting is that 4,200 students are
strictly distance education students. That means that nearly 7% of the
student population of the University of North Texas doesn’t
necessarily ever step foot on the actual campus. For these students,
the University of North Texas is found online and in WebCT.Why did we
move to a new machine?
These numbers show that WebCT is a mission critical application here
at the University of North Texas. As the Distributed Learning
Administrator, I am charged with keeping WebCT available 24 X 7 for
our student and faculty. Our WebCT users are accustomed to this high
level of availability and expect that anytime they go to
webct.unt.edu
they’ll find a functioning and responsive application. Recently,
however, we experienced several unexpected downtimes due to server
hardware failure. We also recently experienced slower than normal
performance from our WebCT server.
Unplanned downtime and slow server response are
unacceptable in our WebCT environment and thus we quickly sought a
path to moving to new WebCT hardware to remedy these problems.
How did we make the transition?
We had been planning to
move to new WebCT hardware during the next semester break and already
had this hardware in place. We decided to go ahead and move to this
new hardware during the semester because we could not risk more
unexpected downtime. Sun Microsystems had replaced every part of our
old WebCT server, but we were still having problems with hardware
failures. On Tuesday, September 16th we made the
transition. This of course is all easier said than done so what
follows is the path we took to a successful migration.
System Specification
Hardware:
Old: Sun UltraSparc II E450 with dual 400 MHz
processors, 1gig of RAM, 120gig mirrored internal hard drives
New: Sun Fire UltraSparc III 280R with dual 1
GHz processors, 4gig of RAM, 170gig fibre attached SAN (expandable as
we need it).
Software:
Old: Solaris 8, Sun Solstice Disksuite Volume
Management software, UFS, WebCT 3.8
New: Solaris 8, Veritas Volume Manager, Veritas
File System, WebCT 3.8
We chose to move to Veritas File System because it is better at
handling our millions of small files than UFS. We have Veritas Volume
Manager in place to build the large volume from our drives that are
presented to the machine from the SAN.
Data:
Approximately 120 gigabytes of data, made up of
8.5+ million files in the WebCT file system structure!
Method :
We were challenged with finding a way to move these 8.5+ million files
from the old server to the new one without losing changes to them that
the students and faculty were making. We did not have the option to
take the machine offline long enough to directly copy all the data
from one machine to the other; WebCT would have been down for nearly
36 hours had we chosen this route. We also could not simply copy the
files while the system was live because of the constant changes being
made to files and because that would have negatively impacted our IO.
After much investigation and testing, we decided on the following
method:
- We took the latest full system tape backup of
the old WebCT system and restored it to the new WebCT system. This
process brought our new system up to date as of September 7th.
This process was also our first indication of the performance
improvement that the new hardware and the new file system would give
us: it took 7 hours to restore the 120+ gig 8+ million files,
whereas on the old system it would have taken approximately 36
hours!
- Once this process was completed, and we had
worked out some bugs in the Veritas File System settings, we began
synchronizing the two systems with
rsync. Rsync is an open source tool that provides incremental
file transfer by comparing the source system (in this case the old
WebCT machine) to the new system and determining what files have
changed and thus what files need to be copied from the old machine
to the new one. Initially, beginning on September 12th,
we ran rsync across the entire file system. This was a very
processor intensive and time consuming process (30+ hours), but it
ensured the system was up to date as of at least the 12th.
- Next, we began running rsync only on the parts
of the WebCT file system that we knew had been changed since
September 12th. We determined what to rsync based on what
courses had been accessed (as reflected in the Apache web server
logs of WebCT). We ran an rsync on Monday, September 15th
overnight to catch up all changes that had been made to the system
since the 12th.
- Throughout the day Tuesday, we ran several
incremental rsyncs to maintain consistency between the two machines.
- Our last rsync before we brought down the
WebCT server was at approximately 9:00PM. We took down the WebCT
application at 11:30 so that users were no longer able to login to
the system and make changes. With the system down, we could take
information from the latest Apache logs (from 9:00PM to 11:30PM) and
rsync just those courses that had been accessed during that 2 ˝ hour
block, thus minimizing our downtime. As it turns out, over 300
courses had been accessed by 1000s of users during that 2 ˝ hour
timeframe! We began an rsync of these 300+ courses that lasted until
about 3:00AM.
- After we had completed this final rsync to
ensure all the most recent updates and changes were copied to the
new system, we had to reinstall the WebCT application. This was
necessary to ensure that all of the paths that are hard-coded in
WebCT scripts were set properly (we had changed the mount point of
the WebCT application). This process took approximately 45 minutes.
We also re-configured the new machine to reflect the same IP address
and hostname as the old machine.
- Once the re-install of the WebCT software was
complete, we were able to begin comparing the two installations. We
checked key parts of key courses that we knew were accessed
frequently and compared them to make sure they were identical. We
also tested all the pieces of WebCT to make sure they were working.
FrontPage access to the server was not completely functional at this
point, but it is not necessary for the functioning of WebCT. We
fixed that piece throughout the days following the move to the new
server.
- At approximately 6:30AM, Tuesday September 17th
we gave everyone access again to WebCT, on the new system
Results
Our method for transferring and synchronizing files was very
successful. We have had literally only one or two problems that were
attributable to this transition. In those cases, we were able to go
back to the old server and pull the needed information that did not
properly copy from one machine to the other. We’ve found that our
server performance is vastly improved. The improvement is a
combination of the faster Veritas file system, and the improved
hardware. Transactions that might have taken minutes to complete on
the old server (user queries for instance) now take a few seconds. As
one user told us, “the new server is sooooooooooooooooooooooo much
faster!!” Most importantly to us, we now have a reliable WebCT server
again.
|