How many mutexes




















The cookie is used to store the user consent for the cookies in the category "Other. The cookie is used to store the user consent for the cookies in the category "Performance". It does not store any personal data. Functional Functional. Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.

Performance Performance. Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors. Analytics Analytics. Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.

Advertisement Advertisement. Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads. To do otherwise is to court disaster.

Despite this advice, some programmers enjoy the intellectual challenge of using lower-level primitives "atomic" types , compare-and-swap, memory barrier instead of mutexes.

There are many problems with such code:. The best way to avoid locking is to avoid shared mutable state. When shared mutable state is needed, use a mutex. If you experience lock contention , consider using more mutexes, each protecting less data that is, make the locking finer-grained.

If you feel you must access a shared mutable variable without a mutex, and have data that shows it is worth the maintenance expense of doing so, ask an expert how to do it. Recall that a mutex's primary purpose is to allow the programmer to maintain monitor invariants, and that the programmer may assume a monitor invariant just after acquiring the appropriate mutex. Consider a Java method M0 that acquires a mutex mu protecting invariant Inv.

The author of M0 is entitled to assume Inv at the moment he acquires mu. Now consider another method M1 that acquires mu , invalidates Inv during an update, calls M0, restores Inv , and releases mu. M0 assumes Inv , but Inv is untrue when M0 is called, so M0 fails. Remember that both M0 and M1 may have multiple implementations written by multiple authors over years, and perhaps multiple implementations in the same program, due to inheritance. The source code of M0 may not be available to the author of M1, or vice versa.

Without remarkably good discipline and documentation standards, the programmers may not understand why M0 is not functioning correctly. If a programmer attempted to do the same thing with a non-reentrant mutex such as Lock , his code would instantly deadlock and a stack trace would show that a thread is attempting to reacquire a lock it already holds.

Not only is the error discovered immediately, but the fix is usually trivial: write a small method M0 that acquires mu and calls a private method M0', which does the real work, assuming Inv but without acquiring mu. The specifications of M0 and M0' differ only in their locking behaviour, so the programmer almost always documents this difference, often in the names of M0 and M0'. The presence of the additional method and the corresponding name or comment provides an additional reminder to the author of M1.

He realizes that by calling M0' rather than M0, he has the burden of establishing Inv before the callit is not guaranteed automatically by the monitor invariant. This solution is not a panacea, but disallowing reentrancy at least makes the error apparent, rather than hiding it. The second problem with reentrancy is associated with condition variables. In the example above, imagine that M0 waits on a condition variable and thus effectively contains more than one critical section.

Normally M0 will work, but if called from M1 with mu held, it is unclear what happens, and neither outcome is satisfactory. If the wait primitive decrements mu 's acquisition count by one, mu does not become free, the condition never becomes true, and the thread deadlocks.

If instead the acquisition count is set to zero by Wait , mu becomes free during a critical section initiated by M1. This is likely to cause M1 to malfunction silently. In the non-reentrant case, M0 must be split into M0 and M0' as before. Since M0' waits on a condition variable, it now has an interesting specification: it temporarily releases a mutex that it did not acquire. This is unusual, and usually very dangerous, so one might expect a comment to that effect.

This comment then tells the author of M1 that he must be especially careful if he calls M0'. The table might have methods insert , lookup , remove. If the table provides its own synchronization, perhaps inserted automatically in each method by some mechanism, we eliminate data races within the table itself, but this does not help the client.

A further problem is that the designers of automatic locking mechanisms often desire to reduce the amount of typing needed to implement a monitor, rather than to improve the readability and maintainability of the code. All too often, these two desires are in conflict; some code is more readable if one can tell when a lock is released.

This style is a dual of the mutex-based style see Lauer and Needham's oft-cited paper on the subject : A message-send corresponds to acquiring a mutex; running in a critical section corresponds to executing code within the owning thread; and receiving a reply corresponds to releasing the mutex. Thus, the most obvious difference between the approaches is that in message-passing all the code that manipulates a particular data item is brought together into one thread, while with mutexes the data accesses can be interleaved with other code.

Message passing and mutexes can be intermixed; often one is preferred either because the author is comfortable with the style, or because one leads to a clearer module than the other. Mutexes work well when critical sections are short and may be invoked in many places, or when reader-writer locks can be used effectively.

In Chrome code, message passing is much more common via TaskRunner and PostTask and low-level primitives like locks and condition variables are used only when necessary. Both models allow high-throughput implementations, and both can suffer from both races and deadlocks. Programmers enjoy the intellectual puzzle of using these operations.

This leads to clever, but ill-advised, and often broken code. Algorithms involving atomic operations are extremely subtle. For example, a general-purpose, efficient, lock-free, singly-linked list algorithm took significant research to discover, and requires care to implement. Almost all programmers make mistakes when they attempt direct use of atomic operations. Even when they don't make mistakes, the resulting code is hard for others to maintain.

Both CPUs and compilers can rearrange reads and writes in ways that lead to subtle race conditions. The simple-sounding pattern of double-checked locking is actually extremely subtle and is usually implemented incorrectly. Programmers assume that locking is expensive, and that using atomic operations will be more efficient. But in reality, acquiring and releasing a lock is cheaper than a cache miss; attention to cache behaviour is usually a more fruitful way to improve performance.

Furthermore, lock-free data structures are often more expensive than using locks. This is because a lock allows arbitrary changes to be made to a complex data structure; if the same changes must be made without a lock, the result is likely to take more atomic read-modify-write instructions, not fewer. People wish to avoid lock contention when concurrency is high. This is best solved by partitioning locked data structures to avoid lock contention.

For example, it is easier, more efficient, and more useful to build a high-concurrency hash table from many normal hash tables, each with its own lock, than to build one lock-free hash table using atomic operations. A single-threaded application can use only one CPU, which typically makes it far slower than other options, even when the overheads of locking are taken into account.

If the application is simple enough, one may be able to run multiple copies of the same program on each machine, but this introduces two inefficiencies: cross-address-space context switches are more expensive than thread context switches because threads share TLB entries while address spaces do not; and the address space separation precludes sharing some resources caches, ports, etc.

Some programs, such as network servers, exhibit natural concurrency: they must deal with many concurrent client requests, so some mechanism is needed to allow this. When a program switches to a new activity, it incurs cache and TLB misses by touching the data and instructions associated with a new activity. This cost is the most important, and is essentially the same regardless of whether the new activity takes place in a different thread or in the same thread.

The cost occurs not only when a multithreaded program performs a thread context switch, but also when an event driven program processes a new event, or when a co-operative multithreaded program switches context in user-space. Multithreaded programs rarely context switch due to time-slicing because time-slices are large. The cost of user-kernel mode switches is sometimes counted as part of the context-switch cost between threads.

However, multithreaded programs usually context switch when they have already entered the kernel for other reasonstypically, via some system call or to service an interrupt. A single-threaded program incurs these same mode switches, and thus they are common to all models. One might expect mutex and condition variable calls to add to the number of system calls, but this effect is modest because uncontended mutexes induce no system calls, while contended mutexes and condition-variable operations should be relatively rare.

Context switches between address spaces are more expensive because TLB entries must be replaced. We can fix the performance problem if we change foo 's signature to include a continuation to be called when foo completes. However, this is not a local change: the loop must be restructured so that the loop variable is encoded in the continuation state. Worse, every use of foo must be similarly modified not only to handle the new signature, but to break the calling procedure into twothe prefix before the call to foo , and the continuation.

Thus, a local change in foo has affected an arbitrary number of modules in a significant way: the event-driven style does not preserve modularity.

Notice how this differs from the multi-threaded case. Just as the event-driven style style requires that foo be non-blocking, the multithreaded style requires that foo be thread-safe.

However, this constraint is easier to live with. First, most libraries are thread-safe either naturally or by design, while few are designed for use in event-driven systems. For example, fprintf is thread safe, but provides no callback mechanism. Second, if foo were not thread safe, calls to it can be made safe either by a change to foo or by wrapping foo with a new routine foo that acquires a lock before calling the old foo. Either change is local, and does not affect other aspects of the interface, so modularity is preserved.

The problem with the event-driven style is worse for routines like printf whose signatures cannot be changed lightly. A further problem with the event-driven style is that the resulting code becomes quite difficult to understand, maintain, and debug. This is primarily because it is harder to tell from reading the code which routine caused the current one to be called, and which routine will be called next.

Standard debugging and performance tools become less effective too, as they rarely have support for tracing through continuations, and continuation mechanisms rarely maintain call history. In contrast, non-event-driven programs maintain much of their state and history conveniently on the thread stacks; debugging tools can reveal a great deal simply by displaying stack traces.

For these reasons, unless a program is quite simple it usually pays in both performance and maintainability to use multiple threads, and to protect shared variables explicitly with mutexes or to communicate between threads with messages.

The presence of the while-loop allows one to tell by local inspection that the condition is true when the loop exits. Hoare's original precise semantics required inspection of all code that could potentially call Signal , which made some errors rather harder to debug.

The while-loop allows clients to do spurious wakeups, which gives the programmer the opportunity to trade performance for ease of programming. Suppose he arranges for threads always to signal the condition variable when they modify protected state, rather than only when they make a specific condition true.

This allows modularity between waiters and wakers: the wakers don't need to know what conditions wakers are waiting for, and each waiter can wait for a different condition without affecting the code of the wakers.

The while-loop allows the condition variable implementation more freedom to schedule woken threads in any order. Consider a thread T0 that wakes thread T1 that was waiting for condition C.

If the runtime semantics guarantee that T1 will enter the critical section next, T1 can assume C. But context switches have overhead, so it is usually more efficient merely to add T1 to the run queue while continuing to run T0 and perhaps other threads, which may then enter the critical section before T1. If any of those threads falsifies C, T1 must not assume C on entering the critical section; scheduling has made it appear that it has received a spurious wakeup.

The while-loop ensures that T1 tests C, and continues only if C is really true. Thus, the while-loop effectively allows more freedom in choosing an efficient scheduling discipline. Timed waits become less error-prone. A timed wait may cause a thread to wake before its condition C is true. Suppose the programmer forgets to test for a timeout. If he is forced to use a while-loop, his thread will go to sleep again and his program will probably deadlock, allowing easy detection of the bug.

Without the while-loop, the thread would falsely assume C, and cause arbitrarily bad behaviour. Signal before W calls cv. Some texts recommend putting Signal after the critical section because this makes it more likely that the mutex is free when a thread attempts to reacquire it on return from Wait. If the Signal were inside the critical section, a naive implementation might wake the thread which could then block once more on the mutex held by the very thread that woke it.

Chrome's condition variables and most reasonable implementations detect this case, and delay waking the waiting thread until the mutex is free. TODO: verify this Hence, there is no performance penalty for calling Signal inside the critical section. In rare cases, it is incorrect to call Signal after the critical section, so we recommend always using it inside the critical section.

The following code can attempt to access the condition variable after it has been deleted, but could be safe if Signal were called inside the critical section.

A network server may become overloaded or see a changed pattern of use, causing a mutex to be used more than it normally would. A program may be run on an upgraded machine with more CPUs, causing contention on a mutex that was previously lightly loaded. Software developers encourage abstraction between parts of our programs, so the authors and clients of modules may have different expectations of how the module will be used.

In particular, a client may cause contention on a mutex that he is unaware of. Ideally, a critical section should provide approximately the same rate of progress to many contending threads as it can to a single thread. Mutex implementations can approximate this ideal by not providing fairness, and by preventing multiple threads that have already blocked from simultaneously competing for the lock. If the predicate does not hold, no memory ordering should be assumed from the Lock operations. This surprises programmers who expect the simplest possible implementation, with no optimizations.

Unfortunately, this expectation is reinforced by some API standards. We discourage such assumptions because they make various transformations more difficult. To remove a node from the list, first search the list starting at ListHead which itself is never removed until the desired node is found.

To protect this search from the effects of concurrent deletions, lock each node before any of its contents are accessed. Because all searches start at ListHead , there is never a deadlock because the locks are always taken in list order. When the desired node is found, lock both the node and its predecessor since the change involves both nodes. Because the predecessor's lock is always taken first, you are again protected from deadlock.

Example shows the C code to remove an item from a singly linked list. Example modifies the previous list structure by converting it into a circular list. There is no longer a distinguished head node; now a thread might be associated with a particular node and might perform operations on that node and its neighbor. Note that lock hierarchies do not work easily here because the obvious hierarchy following the links is circular. Here is the C code that acquires the locks on two nodes and performs an operation involving both of them.

Mutex Lock Code Examples Example shows some code fragments with mutex locking.



0コメント

  • 1000 / 1000