workqueue: reimplement WQ_HIGHPRI using a separate worker_pool
WQ_HIGHPRI was implemented by queueing highpri work items at the head of the global worklist. Other than queueing at the head, they weren't handled differently; unfortunately, this could lead to execution latency of a few seconds on heavily loaded systems. Now that workqueue code has been updated to deal with multiple worker_pools per global_cwq, this patch reimplements WQ_HIGHPRI using a separate worker_pool. NR_WORKER_POOLS is bumped to two and gcwq->pools[0] is used for normal pri work items and ->pools[1] for highpri. Highpri workers get -20 nice level and has 'H' suffix in their names. Note that this change increases the number of kworkers per cpu. POOL_HIGHPRI_PENDING, pool_determine_ins_pos() and highpri chain wakeup code in process_one_work() are no longer used and removed. This allows proper prioritization of highpri work items and removes high execution latency of highpri work items. v2: nr_running indexing bug in get_pool_nr_running() fixed. v3: Refreshed for the get_pool_nr_running() update in the previous patch. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Josh Hunt <joshhunt00@gmail.com> LKML-Reference: <CAKA=qzaHqwZ8eqpLNFjxnO2fX-tgAOjmpvxgBFjv6dJeQaOW1w@mail.gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com>
Šī revīzija ir iekļauta:
@@ -89,25 +89,28 @@ called thread-pools.
|
||||
|
||||
The cmwq design differentiates between the user-facing workqueues that
|
||||
subsystems and drivers queue work items on and the backend mechanism
|
||||
which manages thread-pool and processes the queued work items.
|
||||
which manages thread-pools and processes the queued work items.
|
||||
|
||||
The backend is called gcwq. There is one gcwq for each possible CPU
|
||||
and one gcwq to serve work items queued on unbound workqueues.
|
||||
and one gcwq to serve work items queued on unbound workqueues. Each
|
||||
gcwq has two thread-pools - one for normal work items and the other
|
||||
for high priority ones.
|
||||
|
||||
Subsystems and drivers can create and queue work items through special
|
||||
workqueue API functions as they see fit. They can influence some
|
||||
aspects of the way the work items are executed by setting flags on the
|
||||
workqueue they are putting the work item on. These flags include
|
||||
things like CPU locality, reentrancy, concurrency limits and more. To
|
||||
get a detailed overview refer to the API description of
|
||||
things like CPU locality, reentrancy, concurrency limits, priority and
|
||||
more. To get a detailed overview refer to the API description of
|
||||
alloc_workqueue() below.
|
||||
|
||||
When a work item is queued to a workqueue, the target gcwq is
|
||||
determined according to the queue parameters and workqueue attributes
|
||||
and appended on the shared worklist of the gcwq. For example, unless
|
||||
specifically overridden, a work item of a bound workqueue will be
|
||||
queued on the worklist of exactly that gcwq that is associated to the
|
||||
CPU the issuer is running on.
|
||||
When a work item is queued to a workqueue, the target gcwq and
|
||||
thread-pool is determined according to the queue parameters and
|
||||
workqueue attributes and appended on the shared worklist of the
|
||||
thread-pool. For example, unless specifically overridden, a work item
|
||||
of a bound workqueue will be queued on the worklist of either normal
|
||||
or highpri thread-pool of the gcwq that is associated to the CPU the
|
||||
issuer is running on.
|
||||
|
||||
For any worker pool implementation, managing the concurrency level
|
||||
(how many execution contexts are active) is an important issue. cmwq
|
||||
@@ -115,26 +118,26 @@ tries to keep the concurrency at a minimal but sufficient level.
|
||||
Minimal to save resources and sufficient in that the system is used at
|
||||
its full capacity.
|
||||
|
||||
Each gcwq bound to an actual CPU implements concurrency management by
|
||||
hooking into the scheduler. The gcwq is notified whenever an active
|
||||
worker wakes up or sleeps and keeps track of the number of the
|
||||
currently runnable workers. Generally, work items are not expected to
|
||||
hog a CPU and consume many cycles. That means maintaining just enough
|
||||
concurrency to prevent work processing from stalling should be
|
||||
optimal. As long as there are one or more runnable workers on the
|
||||
CPU, the gcwq doesn't start execution of a new work, but, when the
|
||||
last running worker goes to sleep, it immediately schedules a new
|
||||
worker so that the CPU doesn't sit idle while there are pending work
|
||||
items. This allows using a minimal number of workers without losing
|
||||
execution bandwidth.
|
||||
Each thread-pool bound to an actual CPU implements concurrency
|
||||
management by hooking into the scheduler. The thread-pool is notified
|
||||
whenever an active worker wakes up or sleeps and keeps track of the
|
||||
number of the currently runnable workers. Generally, work items are
|
||||
not expected to hog a CPU and consume many cycles. That means
|
||||
maintaining just enough concurrency to prevent work processing from
|
||||
stalling should be optimal. As long as there are one or more runnable
|
||||
workers on the CPU, the thread-pool doesn't start execution of a new
|
||||
work, but, when the last running worker goes to sleep, it immediately
|
||||
schedules a new worker so that the CPU doesn't sit idle while there
|
||||
are pending work items. This allows using a minimal number of workers
|
||||
without losing execution bandwidth.
|
||||
|
||||
Keeping idle workers around doesn't cost other than the memory space
|
||||
for kthreads, so cmwq holds onto idle ones for a while before killing
|
||||
them.
|
||||
|
||||
For an unbound wq, the above concurrency management doesn't apply and
|
||||
the gcwq for the pseudo unbound CPU tries to start executing all work
|
||||
items as soon as possible. The responsibility of regulating
|
||||
the thread-pools for the pseudo unbound CPU try to start executing all
|
||||
work items as soon as possible. The responsibility of regulating
|
||||
concurrency level is on the users. There is also a flag to mark a
|
||||
bound wq to ignore the concurrency management. Please refer to the
|
||||
API section for details.
|
||||
@@ -205,31 +208,22 @@ resources, scheduled and executed.
|
||||
|
||||
WQ_HIGHPRI
|
||||
|
||||
Work items of a highpri wq are queued at the head of the
|
||||
worklist of the target gcwq and start execution regardless of
|
||||
the current concurrency level. In other words, highpri work
|
||||
items will always start execution as soon as execution
|
||||
resource is available.
|
||||
Work items of a highpri wq are queued to the highpri
|
||||
thread-pool of the target gcwq. Highpri thread-pools are
|
||||
served by worker threads with elevated nice level.
|
||||
|
||||
Ordering among highpri work items is preserved - a highpri
|
||||
work item queued after another highpri work item will start
|
||||
execution after the earlier highpri work item starts.
|
||||
|
||||
Although highpri work items are not held back by other
|
||||
runnable work items, they still contribute to the concurrency
|
||||
level. Highpri work items in runnable state will prevent
|
||||
non-highpri work items from starting execution.
|
||||
|
||||
This flag is meaningless for unbound wq.
|
||||
Note that normal and highpri thread-pools don't interact with
|
||||
each other. Each maintain its separate pool of workers and
|
||||
implements concurrency management among its workers.
|
||||
|
||||
WQ_CPU_INTENSIVE
|
||||
|
||||
Work items of a CPU intensive wq do not contribute to the
|
||||
concurrency level. In other words, runnable CPU intensive
|
||||
work items will not prevent other work items from starting
|
||||
execution. This is useful for bound work items which are
|
||||
expected to hog CPU cycles so that their execution is
|
||||
regulated by the system scheduler.
|
||||
work items will not prevent other work items in the same
|
||||
thread-pool from starting execution. This is useful for bound
|
||||
work items which are expected to hog CPU cycles so that their
|
||||
execution is regulated by the system scheduler.
|
||||
|
||||
Although CPU intensive work items don't contribute to the
|
||||
concurrency level, start of their executions is still
|
||||
@@ -239,14 +233,6 @@ resources, scheduled and executed.
|
||||
|
||||
This flag is meaningless for unbound wq.
|
||||
|
||||
WQ_HIGHPRI | WQ_CPU_INTENSIVE
|
||||
|
||||
This combination makes the wq avoid interaction with
|
||||
concurrency management completely and behave as a simple
|
||||
per-CPU execution context provider. Work items queued on a
|
||||
highpri CPU-intensive wq start execution as soon as resources
|
||||
are available and don't affect execution of other work items.
|
||||
|
||||
@max_active:
|
||||
|
||||
@max_active determines the maximum number of execution contexts per
|
||||
@@ -328,20 +314,7 @@ If @max_active == 2,
|
||||
35 w2 wakes up and finishes
|
||||
|
||||
Now, let's assume w1 and w2 are queued to a different wq q1 which has
|
||||
WQ_HIGHPRI set,
|
||||
|
||||
TIME IN MSECS EVENT
|
||||
0 w1 and w2 start and burn CPU
|
||||
5 w1 sleeps
|
||||
10 w2 sleeps
|
||||
10 w0 starts and burns CPU
|
||||
15 w0 sleeps
|
||||
15 w1 wakes up and finishes
|
||||
20 w2 wakes up and finishes
|
||||
25 w0 wakes up and burns CPU
|
||||
30 w0 finishes
|
||||
|
||||
If q1 has WQ_CPU_INTENSIVE set,
|
||||
WQ_CPU_INTENSIVE set,
|
||||
|
||||
TIME IN MSECS EVENT
|
||||
0 w0 starts and burns CPU
|
||||
|
Atsaukties uz šo jaunā problēmā
Block a user