NEWS
New 64-Bit
Linux Server
with Stata/MP

The SSCC is pleased to announce that FALCON, a 64-bit Linux server with Stata/MP installed, is now available. FALCON is not intended for general use, and for most purposes will perform no better than HAL. But Stata jobs will run faster (some of them much faster) and FALCON can run some jobs which simply cannot be run on other servers.

64-Bit Linux

Most Pentium processors (and their AMD competitors) work with 32 bits of information at a time. This limits the total amount of memory the processor can keep track of to four gigabytes (2^32) and in most cases the maximum that can be assigned to a given task is two gigabytes. As memory has become cheaper and computers with two gigabytes of RAM or more have become common, this limitation has become more and more problematic.

The solution is processors that work with 64 bits of information at a time, along with operating systems and software written specifically for these processors. FALCON represents the first such server at the SSCC. FALCON currently has four gigabytes of actual RAM and a single job can use all of this and more (though claiming more will cause the server to use swap space and Stata jobs which use swap space will run very slowly). If we find that users need even more memory than this, we can add more.

Note that most jobs which do not need more than two gigabytes of RAM will not benefit from using a 64-bit processor. Continue to run them on KITE or HAL--unless you're interested in Stata/MP.

Stata/MP

FALCON is also the first SSCC server to run Stata/MP, though it will not be the last. Stata/MP is a special version of Stata written to take advantage of machines with multiple processors.

All of the SSCC's servers have two physical processors. In addition, a technology called hyperthreading allows each processor to focus on two jobs at once, which is why the servers appear to have four. Having additional processors allows the servers to work efficiently on many jobs at the same time. However, a given job can only run on one processor at a time. Thus having additional processors does not help any given job run faster.

It could help if the job could be broken up into pieces that can be run at the same time on different processors. This is known as parallel processing, and given that chip makers are finding it easier to provide multiple processors than to continue making their current processors faster, it's a very hot topic in computing. But it's not always possible: in many tasks, later steps cannot even begin without the results of earlier steps. Even when parallelization is possible, it requires rewriting the program to do it. (If you are writing your own programs and want to take advantage of parallel processing, keep in mind that the SSCC has a Beowulf cluster for parallel processing jobs.)

In creating Stata/MP, Stata Corporation found as many places where tasks could be parallelized as they could, and rewrote Stata itself accordingly. Thus the do files you write will take advantage of parallel processing automatically without any changes in Stata's syntax. Your do files will run without any modification at all. How much they will benefit from parallelization depends on what you're doing. Linear regression, for example, can be heavily parallelized and will run nearly twice as fast. Most time-series methods, on the other hand, cannot be heavily parallelized and don't benefit nearly as much. Stata Corporation claims an average performance increase of 40%. (For full details, including a report of the performance gains for every Stata command, see the Stata/MP web site.)

Our plan is to make Stata/MP available on FALCON and through Condor. We have ordered upgraded servers for our Condor flock, and they should be ready for use with Stata/MP installed by fall. But at this time FALCON is the only server with Stata/MP available. Since Stata/MP is designed to take advantage of all the processors on a server, it would defeat the purpose to have multiple Stata/MP jobs running at the same time. Thus FALCON is restricted to just one Stata user on a first-come, first-served basis--if someone is already running a Stata job on FALCON, Stata will not allow anyone else to run Stata there. Until Stata/MP is available through Condor as well, we ask that users adhere to the following:

Guidelines for Use

  • Only use FALCON to run Stata/MP if having your Stata job finish more quickly would be a significant help to your work
  • Only use FALCON for batch jobs, not interactive sessions
  • Do not run Stata jobs that will take more than three days to complete (jobs still running after three days will be terminated automatically)

This will give everyone a fair chance to use Stata/MP. Once the new Condor servers are available and there are other places where people can use Stata/MP these restrictions will be lifted.

Keep in mind that you can see opportunities for parallelization which Stata/MP cannot. For example, if you are running bootstrapping replications using a foreach or while loop, Stata/MP will try to parallelize each command but will still run the replications sequentially. On the other hand you could modify your program so that six different Condor jobs each do one-sixth of your replications and the whole process will get done in one-sixth the time. Stata/MP might only reduce the time by a third.