Difference between revisions of "Random number generation"

From reSIProcate
Jump to: navigation, search
m (fix file name)
(added observations)
Line 37: Line 37:
 
* N. Each child process generates C cycles of random numbers, and main process does nothing.
 
* N. Each child process generates C cycles of random numbers, and main process does nothing.
  
 +
The test results for 64-bit Linux 2.6 kernels on two different platforms are in table below. The columns are:
 +
* random --> default, using native random()
 +
* mutex --> using THREAD_MUTEX
 +
* local --> using THREAD_LOCAL
 +
 +
(See more text below the picture -- the image has lots of excess white-space).
 
[[File:RandomPerfResult-1.png]]
 
[[File:RandomPerfResult-1.png]]
 +
 +
Observations:
 +
* Thread local storage provides better performance ALL cases, including all single-thread cases. Platforms that support random_r() should use this configuration.
 +
* The mutex'd cases ("random" and "mutex") show expected linear performance on single-core processor. But they show much WORSE than linear performance on multi-core processor. That is, running many threads on multi-core processor actually slower than running on single-core processor. This result doesn't really make sense, and I hope someone else can replicate/explain it.
 +
* Linux random() optimizes the true single-thread (T=-1) case. The multi-threaded case has 3x penalty even when no thread collisions (e.g., just the overhead of manipulating the mutex structure).
 +
* The resip Mutex class has a ~25% penalty compared to the native mutex in libc's random(). Also, the resip Mutex doesn't special-case the single-thread case.

Revision as of 00:25, 8 February 2011

Configuration and performance of the rutil/Random class.

[This describes changes not yet committed to SVN. -- kennard]

Configuration

The Random class has compile time options to control how it generates random numbers. It is configured via pre-processor symbols, defined either by changing Random.hxx or in the build environment.

The configuration choices primarily affect how multi-threaded applications behave. The behavior is important because the Random class is used to generate SIP CallIds, branch identifiers, and (from/to) tags.

The options are described below. See Random.hxx for most up-to-date descriptions.

[WIN32-default] The Random class uses rand().

RESIP_RANDOM_WIN32_RTL The Random class uses Window's RtlGenRandom (aka SystemFunction036) on XP and higher system, and falls back to rand() on older Windows platforms.

[POSIX-default] The Random class uses libc's random(). The random state is shared across all threads and all libraries of the application. While Linux's libc implementation has an internal mutex for thread protection, POSIX doesn't require this and other platforms may have corruption. Also, any part of the application may "corrupt" the state by calling srandom with "stupid" values.

RESIP_RANDOM_THREAD_MUTEX The Random class itself keeps a Mutex, and will lock the mutex prior to calling random_r(). Of course, this requires that the platform support random_r(). This provides assured thread-safety. Also, the random state is private to resiprocate, and other parts of the application cannot corrupt it.

RESIP_RANDOM_THREAD_LOCAL The Random class allocate state for each thread and tracks that using ThreadLocalStorage. As above, this requires that the platform support random_r(). This avoids serialization among threads (no mutex), and also prevents corruption by non-resiprocate parts of the application.


Linux Performance results

See rutil/test/{testRandomThread.cxx,sweepRandom.sh} for details.

The test program starts T threads, and runs C cycles in each thread, where each cycle generates one million random integers (each integer is 32 bits). It then reports the total time from start of all the threads to completion of the last one. With respect to the number of threads, there are some special cases. Values of T (number of threads):

  • -1. The test is run in the main process, without creating any child threads.
  • 0. The test is run in the main process, but there is also a dummy thread hanging around doing nothing. Turns out this makes a difference, because libc's mutex protection of random() optimizes for the single-thread case.
  • 1. The test is run in a child process, and the main process does nothing.
  • N. Each child process generates C cycles of random numbers, and main process does nothing.

The test results for 64-bit Linux 2.6 kernels on two different platforms are in table below. The columns are:

  • random --> default, using native random()
  • mutex --> using THREAD_MUTEX
  • local --> using THREAD_LOCAL

(See more text below the picture -- the image has lots of excess white-space). RandomPerfResult-1.png

Observations:

  • Thread local storage provides better performance ALL cases, including all single-thread cases. Platforms that support random_r() should use this configuration.
  • The mutex'd cases ("random" and "mutex") show expected linear performance on single-core processor. But they show much WORSE than linear performance on multi-core processor. That is, running many threads on multi-core processor actually slower than running on single-core processor. This result doesn't really make sense, and I hope someone else can replicate/explain it.
  • Linux random() optimizes the true single-thread (T=-1) case. The multi-threaded case has 3x penalty even when no thread collisions (e.g., just the overhead of manipulating the mutex structure).
  • The resip Mutex class has a ~25% penalty compared to the native mutex in libc's random(). Also, the resip Mutex doesn't special-case the single-thread case.