March 30, 2018

Python Threads , GIL and Concurrency demystified.

Threads work a little differently in python if you are coming from C/C++ background. In python, Only one thread can be in running state at a given time.This means Threads in python cannot truly leverage the power of multiple processing cores since by design it’s not possible for threads to run parallelly on multiple cores.

As the memory management in python is not thread-safe each thread require an exclusive access to data structures in python interpreter.This exclusive access is acquired by a mechanism called GIL ( global interpretr lock ).

Why does python use GIL?

In order to prevent multiple threads from accessing interpreter state simultaneously and corrupting the interpreter state.

The idea is whenever a thread is being executed (even if it’s the main thread), a GIL is acquired and after some predefined interval of time the GIL is released by the current thread and reacquired by some other thread( if any).

Why not simply remove GIL?

It is not that its impossible to remove GIL, its just that in prcoess of doing so we end up putting mutiple locks inside interpreter in order to serialize access, which makes even a single threaded application less performant.

so the cost of removing GIL is paid off by reduced performance of a single threaded application, which is never desired.

So when does thread switching occurs in python?

Thread switch occurs when GIL is released.So when is GIL Released? There are two scenarios to take into consideration.

If a Thread is doing CPU Bound operations(Ex image processing).

In Older versions of python , Thread switching used to occur after a fixed no of python instructions.It was by default set to 100.It turned out that its not a very good policy to decide when switching should occur since the time spent executing a single instruction can very wildly from millisecond to even a second.Therefore releasing GIL after every 100 instructions regardless of the time they take to execute is a poor policy.

In new versions instead of using instruction count as a metric to switch thread , a configurable time interval is used. The default switch interval is 5 can get the current switch interval using sys.getswitchinterval(). This can be altered using sys.setswitchinterval()

If a Thread is doing some IO Bound Operations(Ex filesystem access or network IO)

GIL is release whenever the thread is waiting for some for IO operation to get completed.

Which thread to switch to next?

The interpreter doesn’t have its own scheduler.which thread becomes scheduled at the end of the interval is the operating system’s decision. .