Programming Languages - Concurrency
Concurrency is not Parallelism. Concurrency is not Asynchronicity either.
Concurrent vs Parallel
- Parallel: two tasks run in parallel, simultaneously; doing lots of things at once
- Concurrent: two tasks may run simultaneously, or alternatively (on a single core, not parallel), dealing with lots of things at once.
2 Ways to Improve Concurrency
- Async, non-blocking: An asynchronous API returns to the caller before its result is ready.E.g. Node.js/V8, Nginx.
- Sync, multithreading
Async != Concurrency: a program could call an asynchronous function and then sit idle waiting for the results.
Direct vs Continuation-passing style
- direct style: handle concurrent tasks by explicitly waiting for them to complete so that you can consume their results. This normally leads to much simpler code.
- continuation-passing style: avoid blocking or waiting and instead specify callbacks to be executed when concurrent tasks complete.
Source of concurrency
Processes, threads, pools, event loops, fibers, actors, etc.
Async (non-blocking)
“asynchronous” patterns — callbacks; futures and queues
Fibers / Coroutines
Learn more about fibers
In languages:
- C++20
- Kotlin
Future / Promise
In the Future pattern, instead of returning the result, the function returns a proxy object that allows the caller to wait for the result at some later point.
Sometimes uses async
and await
.
In languages:
- C++11: uses both
future
andpromise
and they are differentstd::promise
(move-only) is used by the "producer/writer" of the asynchronous operation.std::future
(move-only) /std::shared_future
(copyable, but gives only const access to the value) are used by the "consumer/reader" of the asynchronous operation.- the reason is to hide the "write/set" functionality from the "consumer/reader", so that
future
provides a read-only view - to get a future from a promise:
auto future = promise.get_future();
- Java: uses the word
Future
only. - JavaScript: uses the word
Promise
only. - The Go analogue to a Future is a single-element buffered channel.
Communicating Sequential Processes (CSP)
- The idea is that there can be two processes or threads that act independently of one another but share a "channel", which one process / thread puts data into and the other process / thread consumes.
- the channel is shared, and can be shared by multiple producers and consumers
- limited to the current runtime and cannot be distributed, even between two runtimes on the same physical box.
- only buffered channel is async; sender / receiver is sync, may be blocked
In languages:
- Go: Goroutines
- Clojure:
core.async
Reactive Programming
Take a = b + c
for example:
- in imperative programming,
a
is assigned the value ofb + c
, after thatb
andc
can change without affecting the value ofa
- in reactive programming, whenever
b
andc
changes,a
will be re-evaluated.
Imperative programming is the pull model, where the sum of b
and c
is "pulled" into a
, and only the values of b
and c
at that time matters; and reactive programming is the push model, that over the time, the changes of b
and c
will be pushed into a
.
One good example of "reactive" is the speadsheet: if a cell is derived from other cells by a formula, say SUM
, whenever any of the other cells change, the SUM
cell will change also.
The popular React framework is not really reactive, since it compares the Virtual DOM and the real DOM and update the real DOM if necessary.
In languages:
Not all languages natively support reactive programming, but there are third party libraries available. For example, ReactiveX adds Observables
to different languages, including RxJava for Java; reactive-streams is another effort for Java.
Actor Model
- Actors have their own mailbox
- you have to have a reference (Akka) or PID (Erlang) to the other actor in order to send it a message
- fault tolerance
- actor can have mutable state inside of it and a guarantee of no multithreaded access to the state
- sender is async, only receiver can be blocked
- good for distributed system
In languages:
- Scala: Akka
- Erlang
Sync (Blocking, Multiprocessing / Multithreading)
Common problems: Memory-interference, race conditions, deadlock, live lock and starvation.
Work-sharing vs Work-stealing
Source: https://rakyll.org/scheduler/
In multi-threaded computation, two scheduling paradigms: work-sharing and work-stealing.
- Work-sharing: When a processor generates new threads, it attempts to migrate some of them to the other processors with the hopes of them being utilized by the idle / underutilized processors.
- Work-stealing: An underutilized processor actively looks for other processor’s threads and “steal” some.
IPC
IPC: Inter Process Communication.
- Shared Memory
- Pipes
- Sockets
- RPC (Remote procedure call)
IPC is used not just for communication between processes on the same system, but processes on different systems.
Threads
Threads spawned in descendant Threads would be unknown to the parent without some kind of explicit record keeping system.
Thread Pools
Typically there are fewer threads than tasks.
Pros:
- avoid repeated, expensive thread creation and destruction
- manage resource consumption: limit the number of simultaneously created and active threads, both in total and those devoted to a particular task, which can prevent the application from running out of memory or from robbing other critical tasks of the resources they need.
- make application-specific decisions about how to schedule the tasks, by decoupling threads from tasks.
Cons:
- If there are no idle threads available in the pool, tasks aren't executed immediately.
- The task creator may even be blocked to wait for work to drain from the thread pool.
- The kernel is unaware of any task scheduling being done at the application level.
- Cannot make assumptions about the order of execution of tasks: threads can be context-switched by the kernel at any time, potentially causing an essentially arbitrary delay for any particular task at any point.
- Deadlock: The amount of concurrency is bounded by the number of threads, and the number of threads is always finite. These limits on concurrency and asynchrony can lead to deadlock in non-intuitive situations. In general, they make it dangerous to do producer-consumer synchronization between tasks in the same thread pool. All tasks given to a thread pool must be independent.
- Out of Memory / Address Space (OOM): need to bound both the thread pool size and the maximum queue length.
- System Limitations on Numbers of Threads: the kernel and pthread library may have limits on the number of simultaneous threads they can run well.
By Languages
C++
- C++11:
std::promise
,std::future
,std::mutex
- C++20: coroutine
Java
Built-in solution:
- multi-thread. JVM natively supports multithreading running on multiple cores.
Future
(since Java 5) andCompletableFuture
(since Java 8)
Third party libs: ReactiveX, Akka.
Read more: Java Concurrency
JavaScript
JavaScript executes in a single-threaded event loop. async
, await
, and promises.
Python: GIL (Global interpreter lock)
CPython is effectively single-threaded: the interpreter doesn’t support fine-grained locking mechanism like JVM, any thread must hold the GIL to access the memory space, i.e. a Python object can be used by only one thread at a time.
Multithreaded, CPU-bound operations either need to be handled by way of C extensions or multiple instances of CPython.
Go
Use Channels and goroutines.
Kotlin
Coroutines.
PHP
PHP runs in Nginx which doesn't have a thread based architecture, so we do not use locks or spawn new threads.
Go
- “Start goroutines when you have concurrent work.”
- “Share by communicating.”
Don't use asynchronous callbacks.
The Go analogue to a Future is a single-element buffered channel. ch := make(chan Item, 1)
.
A buffered channel can be used like a semaphore.
goroutine
A goroutine is a lightweight thread managed by the Go runtime.
go func() {
// ...
}
A select
blocks until one of its cases can run, then it executes that case. It chooses one at random if multiple are ready.
func main() {
tick := time.Tick(100 * time.Millisecond)
boom := time.After(500 * time.Millisecond)
for {
select {
case <-tick:
fmt.Println("tick.")
case <-boom:
fmt.Println("BOOM!")
return
default:
fmt.Println(" .")
time.Sleep(50 * time.Millisecond)
}
}
}
Channels
Channels are a typed conduit through which you can send and receive values with the channel operator, <-
:
// Create a channel (unbuffered, i.e. buffer size = 0).
// Sends and receives block until the other side is ready.
ch := make(chan int)
// Create a buffered channel.
// Sends to a buffered channel block only when the buffer is full.
// Receives block when the buffer is empty.
ch := make(chan int, 100)
// Send v to channel ch.
ch <- v
// Receive from ch, and assign value to v.
v := <-ch
// Senders can close the channel.
// Receiver should never close a channel.
// (Channels do not have to be closed; only when receivers need to be informed.)
close(ch)
// Receivers can test whether a channel has been closed.
v, ok := <-ch
// Receive values from the channel repeatedly until it is closed.
for i := range c
Unbuffered channel allows goroutines to synchronize without explicit locks or condition variables.
Mutex
In the standard library: sync.Mutex
. 2 methods: Lock()
and Unlock()
.
// Create a Mutex
mu sync.Mutex
// Lock
mu.Lock()
// Do something
// Unlock
mu.Unlock()