I've compiled a summary (kinda) of the concurrency chapter of Josh Bloch's excellent Effective Java book. Included are also some notes on Goetz's Java Concurrency in Practice.
Item 66 : Synchronize access to shared mutable data
Item 66 : Synchronize access to shared mutable data
- Reading or writing a variable is atomic unless the variable is of type long or double.
- People might say that avoid synchronization while reading or writing atomic data because this will improve performance but this is wrong. e.g., reading and writing a boolean filed is atomic, so you might choose to ignore synchronization when accessing this boolean.
while(!stop) {
i++;
}
- This won't terminate with a HotSpot VM because it will be translated into :
if(!stop) {
while(true) {
i++;
}
}
- This is known as hoisting.
- To fix, either guard access to boolean via synchronized methods or make boolean volatile so that any thread that reads the field will see the most recently written value.
- Synchronize takes no effect unless both read and write are synchronized.
- Increment (++) is not atomic because it performs two operations. Interleaving of threads might break a serial number generator program which uses a volatile int variable and ++ to generate serials.
- For this, either use synchronized method or use AtomicLong, which is part of java.util.concurrent.atomic and is likely to perform better than the synchronized version.
- If you need only inter-thread communication (e.g., checking a boolean value), and not mutual exclusion, the volatile modified is an acceptable form of synchronization, but it can be tricky to use correctly.
Item 67 : Avoid excessive synchronization
- To avoid liveness and safety failures, never cede control to the client within a synchronized method or block.
- CopyOnWriteArrayList is a variant of ArrayList where are write operations are implemented by making a fresh copy of the entire underlying array. Iteration on these requires no locking and is very fast because the internal array is never modified.
- They are perfect for observer lists, which are rarely modified and often traversed.
- An alien method invoked outside of a synchronized region is known as an open call. These greatly increase concurrency.
- As a rule, you should do as little work as possible inside synchronized regions.
- The real cost of excessive synchronization is not the time spent obtaining locks; it is the lost opportunities for parallelism and the delays imposed by the need to ensure that eery core has a consistent view of memory. It can also limit the VM's ability to optimize code execution.
- StringBuffer instances are almost always used by a single thread, yet they perform internal synchronization. Thats why StringBuffers were replaced by the unsynchronized StringBuilders.
- If a method modifies a static field, you must synchronized access to this field, even if the method is used only by a single thread. You can't expect clients to synchronize access to such fields.
Item 68 : Prefer executors and tasks to threads
- Use executors and tasks to threads. e.g.,
// Create work queue
ExecutorService exec = Executors.newSingleThreadExecutor();
// Submit runnable for execution
exec.execute(runnable);
// Tell executor to terminate. Otherwise, your VM won't exit
exec.shutdown();
Item 69 : Prefer concurrency utilities to wait and notify
- Given the difficulty of using wait and notify correctly, you should use the higher level concurrency utilities instead. These utilities are- the executor framework, concurrent collections, and synchronizers.
- Concurrent collections manage their own synchronization internally, therefore it is impossible to exclude concurrent activity from a concurrent collection.
- This means that clients can't atomically compose method invocations on concurrent collections.
- ConcurrentHashMap is optimized for retrieval operations, like get. It offers excellent concurrency and is very fast. Use ConcurrentHashMap in preference to Collections.synchronizedMap or HashTable (which is a synchronized HashMap).
- Use BlockingQueue for work queues (also known as producer consumer queues). Operations block until they can be successfully performed.
BlockingQueue<Object> sharedQueue
= new ArrayBlockingQueue<Object>(BUFFER_SIZE);
// Simple framework for timing concurrent execution
public static long time(Executor executor, int concurrency,
final Runnable action) throws InterruptedException {
final CountDownLatch ready = new CountDownLatch(concurrency);
final CountDownLatch start = new CountDownLatch(1);
final CountDownLatch done = new CountDownLatch(concurrency);
for (int i = 0; i < concurrency; i++) {
executor.execute(new Runnable() {
public void run() {
ready.countDown(); // Tell timer we're ready
try {
start.await(); // Wait till peers are ready
action.run();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
} finally {
done.countDown(); // Tell timer we're done
}
}
});
}
ready.await(); // Wait for all workers to be ready
long startNanos = System.nanoTime();
start.countDown(); // And they're off!
done.await(); // Wait for all workers to finish
return System.nanoTime() - startNanos;
}
// The standard idiom for using the wait method
synchronized (obj) {
while (<condition does not hold>)
obj.wait(); // (Releases lock, and reacquires on wakeup)
... // Perform action appropriate to condition
}
- Another thread could have obtained the lock and changed the guarded state between the time a thread invoked notify and the time the waiting thread woke.
- Another thread could have invoked notify accidentally or maliciously when the condition did not hold. Classes expose themselves to this sort of mischief by waiting on publicly accessible objects. Any wait contained in a synchronized method of a publicly accessible object is susceptible to this problem.
- The notifying thread could be overly “generous” in waking waiting threads. For example, the notifying thread might invoke notifyAll even if only some of the waiting threads have their condition satisfied.
- The waiting thread could (rarely) wake up in the absence of a notify. This is known as a spurious wakeup [Posix, 11.4.3.6.1; JavaSE6].
Item 70 : Document thread safety
- Document how your class behaves when subjected to concurrent use.
- The presence of the synchronized modifier in a method declaration is an implementation detail, not a part of its exported API. It does not reliably indicate that a method is thread-safe.
- Moreover, there are several levels of thread safety. Your class needs to document what level of thread safety it supports.
- immutable - Instances of this class appear constant. No external synchronization is necessary. e.g., String, Long and BigInteger.
- unconditionally thread-safe - Intance of this class are mutable, but the class has sufficient internal synchronization that its instances can be used concurrently without the need for any external synchronization. e.g., Random and ConcurrentHashMap.
- conditionally thread-safe - Some methods require external synchronization for safe concurrent use. e.g., Collections returned by the Collections.synchronized wrappers, whose iterators require external synchronization. Failure to follow this advice may result in non-deterministic behaviour.
Map<K, V> m = Collections.synchronizedMap(new HashMap<K, V>()); Set<K> s = m.keySet(); // Need not be synchronized synchronized(m) { // Synchronizing on m, not s! for (K key : s) key.f(); }
- not thread-safe - Clients must surround each method invocation (or invocation sequence) with external synchronization of the client's choosing. e.g., ArrayList or HashMap.
- thread-hostile - Not safe for concurrent use at all. Usually results from modifying static data without synchronization. Such classes result from the failure to consider concurrency, and no one writes these on purpose. e.g., System.runFinalizersOnExit, which is deprecated.
- (On the context of unconditionally thread safe classes) If a class commits to using a publicly accessibly lock (e..g, synchronized), it enables clients to execute a sequence of method invocations atomically, but there is a cost for this flexibility. It is incompatible with high-performance internal concurrency control (offered by ConcurrentHashMap). Also, a client can do a DDoS attack by holding the lock for a prolonged period.
- To prevent this, use a private lock object (which is inaccessible to clients of the class) instead of using synchronized blocks.
// Private lock object idiom - thwarts denial-of-service attack private final Object lock = new Object(); public void foo() { synchronized(lock) { ... } }
- This idiom is only for unconditionally thread-safe classes. Conditionally thread-safe classes can't use this idiom because they must document which lock their clients are to acquire when performing certain method invocation sequences.
- Note that the lock field has to be final, which prevents you from changing its contents, which could result in catastrophic unsynchronized access to the containing object.
Item 71 : Use lazy initialization judiciously
Goetz 8.2 :
Code snippet 1 (unrelated to Bloch or Goetz) :
- While lazy initialization is primarily an optimization, it can also be used to break harmful circularities in class and instance initialization.
- Lazy initialization is the act of delaying the initialization of a field until its value is needed. If the value is never needed, the field is never initialized. This technique is applicable to both static and instance fields.
- While lazy initialization is primarily an optimization, it can also be used to break harmful circularities in class and instance initialization [Bloch05, Puzzle 51].
- As is the case for most optimizations, the best advice for lazy initialization is “don’t do it unless you need to” (Item 55).
- Lazy initialization is a double-edged sword. It decreases the cost of initializing a class or creating an instance, at the expense of increasing the cost of accessing the lazily initialized field.
- Depending on what fraction of lazily initialized fields eventually require initialization, how expensive it is to initialize them, and how often each field is accessed, lazy initialization can (like many “optimizations”) actually harm performance.
- That said, lazy initialization has its uses. If a field is accessed only on a fraction of the instances of a class and it is costly to initialize the field, then lazy initialization may be worthwhile.
- The only way to know for sure is to measure the performance of the class with and without lazy initialization.
- In the presence of multiple threads, lazy initialization is tricky. If two or more threads share a lazily initialized field, it is critical that some form of synchronization be employed, or severe bugs can result (Item 66).
- All of the initialization techniques discussed in this item are thread-safe.
- Under most circumstances, normal initialization is preferable to lazy initialization.
- Here is a typical declaration for a normally initialized instance field.
- Note the use of the final modifier (Item 15):
- If you use lazy initialization to break an initialization circularity, use a synchronized accessor, as it is the simplest, clearest alternative:
// Normal initialization of an instance field private final FieldType field = computeFieldValue(); // Lazy initialization of instance field - synchronized accessor private FieldType field; synchronized FieldType getField() { if (field == null) field = computeFieldValue(); return field; }
- Both of these idioms (normal initialization and lazy initialization with a synchronized accessor) are unchanged when applied to static fields, except that you add the static modifier to the field and accessor declarations.
- If you need to use lazy initialization for performance on a static field, use the lazy initialization holder class idiom. This idiom (also known as the initialize on-demand holder class idiom) exploits the guarantee that a class will not be initialized until it is used [JLS, 12.4.1].
// Lazy initialization holder class idiom for static fields private static class FieldHolder { static final FieldType field = computeFieldValue(); } static FieldType getField() { return FieldHolder.field; }
- When the getField method is invoked for the first time, it reads Field-Holder.field for the first time, causing the FieldHolder class to get initialized.
- The beauty of this idiom is that the getField method is not synchronized and performs only a field access, so lazy initialization adds practically nothing to the cost of access. A modern VM will synchronize field access only to initialize the class.
- Once the class is initialized, the VM will patch the code so that subsequent access to the field does not involve any testing or synchronization.
- If you need to use lazy initialization for performance on an instance field, use the double-check idiom. This idiom avoids the cost of locking when accessing the field after it has been initialized (Item 67).
- The idea behind the idiom is to check the value of the field twice (hence the name double-check): once without locking, and then, if the field appears to be uninitialized, a second time with locking.
- Only if the second check indicates that the field is uninitialized does the call initialize the field. Because there is no locking if the field is already initialized, it is critical that the field be declared volatile (Item 66).
// Double-check idiom for lazy initialization of instance fields private volatile FieldType field; FieldType getField() { FieldType result = field; if (result == null) { // First check (no locking) synchronized(this) { result = field; if (result == null) // Second check (with locking) field = result = computeFieldValue(); } } return result; }
- This code may appear a bit convoluted. In particular, the need for the local variable result may be unclear. What this variable does is to ensure that field is read only once in the common case where it’s already initialized. While not strictly necessary, this may improve performance and is more elegant by the standards applied to low-level concurrent programming. On my machine, the method above is about 25 percent faster than the obvious version without a local variable.
- Prior to release 1.5, the double-check idiom did not work reliably because the semantics of the volatile modifier were not strong enough to support it [Pugh01]. The memory model introduced in release 1.5 fixed this problem [JLS, 17, Goetz06 16].
- Today, the double-check idiom is the technique of choice for lazily initializing an instance field. While you can apply the double-check idiom to static fields as well, there is no reason to do so: the lazy initialization holder class idiom is a better choice.
- Two variants of the double-check idiom bear noting. Occasionally, you may need to lazily initialize an instance field that can tolerate repeated initialization. If you find yourself in this situation, you can use a variant of the double-check idiom that dispenses with the second check. It is, not surprisingly, known as the singlecheck idiom. Note that field is still declared volatile:
// Single-check idiom - can cause repeated initialization! private volatile FieldType field; private FieldType getField() { FieldType result = field; if (result == null) field = result = computeFieldValue(); return result; }
- All of the initialization techniques discussed in this item apply to primitive fields as well as object reference fields.
- When the double-check or single-check idiom is applied to a numerical primitive field, the field’s value is checked against 0 (the default value for numerical primitive variables) rather than null.
- If you don’t care whether every thread recalculates the value of a field, and the type of the field is a primitive other than long or double, then you may choose to remove the volatile modifier from the field declaration in the single-check idiom. This variant is known as the racy single-check idiom. It speeds up field access on some architectures, at the expense of additional initializations (up to one per thread that accesses the field). This is definitely an exotic technique, not for everyday use. It is, however, used by String instances to cache their hash codes.
- In summary, you should initialize most fields normally, not lazily. If you must initialize a field lazily in order to achieve your performance goals, or to break a harmful initialization circularity, then use the appropriate lazy initialization technique. For instance fields, it is the double-check idiom; for static fields, the lazy initialization holder class idiom. For instance fields that can tolerate repeated initialization, you may also consider the single-check idiom.
Item 72 : Don’t depend on the thread scheduler
- Thread scheduler determines which threads get to run and for how long.
- Any program that relies on the thread scheduler for correctness or performance is likely to be nonportable.
- The best way to write a robust, responsive, portable program is to ensure that the average number of runnable threads is not significantly greater than the number of processors.
- This leaves the thread scheduler with little choice: it simply runs the runnable threads till they’re no longer runnable.
- The program’s behaviour doesn’t vary too much, even under radically different thread-scheduling policies.
- Note that the number of runnable threads isn’t the same as the total number of threads, which can be much higher. Threads that are waiting are not runnable.
- The main technique for keeping the number of runnable threads down is to have each thread do some useful work and then wait for more.
- Threads should not run if they aren’t doing useful work.
- In terms of the Executor Framework (Item 68), this means sizing your thread pools appropriately (Goetz 8.2; look below after item 73), and keeping tasks reasonably small and independent of one another.
- Tasks shouldn’t be too small, or dispatching overhead will harm performance.
- Threads should not busy-wait.
- When faced with a program that barely works because some threads aren’t getting enough CPU time relative to others, resist the temptation to “fix” the program by putting in calls to Thread.yield. You may succeed in getting the program to work after a fashion, but it will not be portable. Thread.yield also has no testable semantics.
- A better course of action is to restructure the application to reduce the number of concurrently runnable threads.
- A related technique, to which similar caveats apply, is adjusting thread priorities. Thread priorities are among the least portable features of the Java platform.
- It is not unreasonable to tune the responsiveness of an application by tweaking a few thread priorities, but it is rarely necessary and is not portable.
- It is unreasonable to solve a serious liveness problem by adjusting thread priorities. The problem is likely to return until you find and fix the underlying cause.
- You should use Thread.sleep(1) instead of Thread.yield for concurrency testing. Do not use Thread.sleep(0), which can return immediately.
- Thread.yield and thread priorities are merely hints to the scheduler.
- Thread priorities may be used sparingly to improve the quality of service of an already working program, but they should never be used to “fix” a program that barely works.
Item 73 : Avoid thread goups
- Thread groups were originally envisioned as a mechanism for isolating applets for security purposes.
- Thread groups don't provide much in the way of useful functionality and much of the functionality they provide is flawed. Just ignore their existence.
- If you design a class that deals with logical groups of threads, you should probably use thread pool executors (Item 68).
- The ideal size for a thread pool depends on the types of tasks that will be submitted and the characteristics of the deployment system.
- Thread pool sizes should rarely be hardcoded; instead pool sizes should be provided by a configuration mechanism or computed dynamically by consulting Runtime.availableProcessors.
- Sizing thread pools is not an exact science, but fortunately you need only avoid the extremes of "too big" and "too small".
- If a thread pool is too big, then threads compete for scarce CPU and memory resources, resulting in higher memory usage and possible resource exhaustion.
- If it is too small, throughput suffers as processors go unused despite available work.
- To size a thread pool properly, you need to understand your computing environment, your resource budget, and the nature of your tasks.
- How many processors does the deployment system have? How much memory? Do tasks perform mostly computation, I/O, or some combination? Do they require a scarce resource, such as a JDBC connection?
- If you have different categories of tasks with very different behaviors, consider using multiple thread pools so each can be tuned according to its workload.
- For compute intensive tasks, an Ncpu processor system usually achieves optimum utilization with a thread pool of Ncpu +1 threads. (Even compute intensive threads occasionally take a page fault or pause for some other reason, so an "extra" runnable thread prevents CPU cycles from going unused when this happens.)
- For tasks that also include I/O or other blocking operations, you want a larger pool, since not all of the threads will be schedulable at all times.
- In order to size the pool properly, you must estimate the ratio of waiting time to compute time for your tasks; this estimate need not be precise and can be obtained through profiling or instrumentation. Alternatively, the size of the thread pool can be tuned by running the application using several different pool sizes under a benchmark load and observing the level of CPU utilization.
- Given these definitions :
- The optimal pool size for keeping the processors at the desired utilization is :
W/C = ratio of wait time to compute time
- How to determine the number of CPUs :
int nCpus = Runtime.getRuntime().availableProcessors();
- Of course, CPU cycles are not the only resource you might want to manage using thread pools. Other resources that can contribute to sizing constraints are memory, file handles, socket handles, and database connections.
- Calculating pool size constraints for these types of resources is easier: just add up how much of that resource each task requires and divide that into the total quantity available. The result will be an upper bound on the pool size.
- When tasks require a pooled resource such as database connections, thread pool size and resource pool size affect each other. If each task requires a connection, the effective size of the thread pool is limited by the connection pool size.
- Similarly, when the only consumers of connections are pool tasks, the effective size of the connection pool is limited by the thread pool size.
Code snippet 1 (unrelated to Bloch or Goetz) :
- Use ScheduledExecutorService instead of the Timer class (source). Why?
- Timer can be sensitive to changes in the system clock, ScheduledThreadPoolExecutor isn't.
- Timer has only one execution thread, so long-running task can delay other tasks. ScheduledThreadPoolExecutor can be configured with any number of threads. Furthermore, you have full control over created threads, if you want (by providing ThreadFactory).
- Runtime exceptions thrown in TimerTask kill that thread i.e., scheduled tasks will not run anymore. ScheduledThreadExecutor not only catches runtime exceptions, but it lets you handle them if you want (by overriding afterExecute method from ThreadPoolExecutor). The Task which threw the exception will be canceled, but other tasks will continue to run.
class BeeperControl { private final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1); public void beepForAnHour() { final Runnable beeper = new Runnable() { public void run() { System.out.println("beep"); } }; final ScheduledFuture<?> beeperHandle = scheduler.scheduleAtFixedRate(beeper, 10, 10, SECONDS); scheduler.schedule(new Runnable() { public void run() { beeperHandle.cancel(true); } }, 60 * 60, SECONDS); } }
Code snippet 2 (unrelated to Bloch or Goetz) :
- You can choose to work with Callables too- the ExecutorService submit method takes in a callable and returns a parametrized future (you can call get() on the future object to get the String returned by the call method).
Callable<String> c = new Callable<String>() { @Override public String call() throws Exception { return "foo"; } }; ExecutorService scheduler = Executors.newCachedThreadPool(); Future<String> future = scheduler.submit(c); System.out.println(future.get()); // Prints "foo"
No comments:
Post a Comment