| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| An authenticated admin user with access to both the management WebUI and command line interface on a Firebox can enable a diagnostic debug shell by uploading a platform and version-specific diagnostic package and executing a leftover diagnostic command.
This issue affects Fireware OS: from 12.0 before 12.11.2. |
| In the Linux kernel, the following vulnerability has been resolved:
LoongArch: Fix warnings during S3 suspend
The enable_gpe_wakeup() function calls acpi_enable_all_wakeup_gpes(),
and the later one may call the preempt_schedule_common() function,
resulting in a thread switch and causing the CPU to be in an interrupt
enabled state after the enable_gpe_wakeup() function returns, leading
to the warnings as follow.
[ C0] WARNING: ... at kernel/time/timekeeping.c:845 ktime_get+0xbc/0xc8
[ C0] ...
[ C0] Call Trace:
[ C0] [<90000000002243b4>] show_stack+0x64/0x188
[ C0] [<900000000164673c>] dump_stack_lvl+0x60/0x88
[ C0] [<90000000002687e4>] __warn+0x8c/0x148
[ C0] [<90000000015e9978>] report_bug+0x1c0/0x2b0
[ C0] [<90000000016478e4>] do_bp+0x204/0x3b8
[ C0] [<90000000025b1924>] exception_handlers+0x1924/0x10000
[ C0] [<9000000000343bbc>] ktime_get+0xbc/0xc8
[ C0] [<9000000000354c08>] tick_sched_timer+0x30/0xb0
[ C0] [<90000000003408e0>] __hrtimer_run_queues+0x160/0x378
[ C0] [<9000000000341f14>] hrtimer_interrupt+0x144/0x388
[ C0] [<9000000000228348>] constant_timer_interrupt+0x38/0x48
[ C0] [<90000000002feba4>] __handle_irq_event_percpu+0x64/0x1e8
[ C0] [<90000000002fed48>] handle_irq_event_percpu+0x20/0x80
[ C0] [<9000000000306b9c>] handle_percpu_irq+0x5c/0x98
[ C0] [<90000000002fd4a0>] generic_handle_domain_irq+0x30/0x48
[ C0] [<9000000000d0c7b0>] handle_cpu_irq+0x70/0xa8
[ C0] [<9000000001646b30>] handle_loongarch_irq+0x30/0x48
[ C0] [<9000000001646bc8>] do_vint+0x80/0xe0
[ C0] [<90000000002aea1c>] finish_task_switch.isra.0+0x8c/0x2a8
[ C0] [<900000000164e34c>] __schedule+0x314/0xa48
[ C0] [<900000000164ead8>] schedule+0x58/0xf0
[ C0] [<9000000000294a2c>] worker_thread+0x224/0x498
[ C0] [<900000000029d2f0>] kthread+0xf8/0x108
[ C0] [<9000000000221f28>] ret_from_kernel_thread+0xc/0xa4
[ C0]
[ C0] ---[ end trace 0000000000000000 ]---
The root cause is acpi_enable_all_wakeup_gpes() uses a mutex to protect
acpi_hw_enable_all_wakeup_gpes(), and acpi_ut_acquire_mutex() may cause
a thread switch. Since there is no longer concurrent execution during
loongarch_acpi_suspend(), we can call acpi_hw_enable_all_wakeup_gpes()
directly in enable_gpe_wakeup().
The solution is similar to commit 22db06337f590d01 ("ACPI: sleep: Avoid
breaking S3 wakeup due to might_sleep()"). |
| In the Linux kernel, the following vulnerability has been resolved:
PCI: rcar-ep: Fix incorrect variable used when calling devm_request_mem_region()
The rcar_pcie_parse_outbound_ranges() uses the devm_request_mem_region()
macro to request a needed resource. A string variable that lives on the
stack is then used to store a dynamically computed resource name, which
is then passed on as one of the macro arguments. This can lead to
undefined behavior.
Depending on the current contents of the memory, the manifestations of
errors may vary. One possible output may be as follows:
$ cat /proc/iomem
30000000-37ffffff :
38000000-3fffffff :
Sometimes, garbage may appear after the colon.
In very rare cases, if no NULL-terminator is found in memory, the system
might crash because the string iterator will overrun which can lead to
access of unmapped memory above the stack.
Thus, fix this by replacing outbound_name with the name of the previously
requested resource. With the changes applied, the output will be as
follows:
$ cat /proc/iomem
30000000-37ffffff : memory2
38000000-3fffffff : memory3
[kwilczynski: commit log] |
| In the Linux kernel, the following vulnerability has been resolved:
RDMA/rtrs: Add missing deinit() call
A warning is triggered when repeatedly connecting and disconnecting the
rnbd:
list_add corruption. prev->next should be next (ffff88800b13e480), but was ffff88801ecd1338. (prev=ffff88801ecd1340).
WARNING: CPU: 1 PID: 36562 at lib/list_debug.c:32 __list_add_valid_or_report+0x7f/0xa0
Workqueue: ib_cm cm_work_handler [ib_cm]
RIP: 0010:__list_add_valid_or_report+0x7f/0xa0
? __list_add_valid_or_report+0x7f/0xa0
ib_register_event_handler+0x65/0x93 [ib_core]
rtrs_srv_ib_dev_init+0x29/0x30 [rtrs_server]
rtrs_ib_dev_find_or_add+0x124/0x1d0 [rtrs_core]
__alloc_path+0x46c/0x680 [rtrs_server]
? rtrs_rdma_connect+0xa6/0x2d0 [rtrs_server]
? rcu_is_watching+0xd/0x40
? __mutex_lock+0x312/0xcf0
? get_or_create_srv+0xad/0x310 [rtrs_server]
? rtrs_rdma_connect+0xa6/0x2d0 [rtrs_server]
rtrs_rdma_connect+0x23c/0x2d0 [rtrs_server]
? __lock_release+0x1b1/0x2d0
cma_cm_event_handler+0x4a/0x1a0 [rdma_cm]
cma_ib_req_handler+0x3a0/0x7e0 [rdma_cm]
cm_process_work+0x28/0x1a0 [ib_cm]
? _raw_spin_unlock_irq+0x2f/0x50
cm_req_handler+0x618/0xa60 [ib_cm]
cm_work_handler+0x71/0x520 [ib_cm]
Commit 667db86bcbe8 ("RDMA/rtrs: Register ib event handler") introduced a
new element .deinit but never used it at all. Fix it by invoking the
`deinit()` to appropriately unregister the IB event handler. |
| In the Linux kernel, the following vulnerability has been resolved:
net: let net.core.dev_weight always be non-zero
The following problem was encountered during stability test:
(NULL net_device): NAPI poll function process_backlog+0x0/0x530 \
returned 1, exceeding its budget of 0.
------------[ cut here ]------------
list_add double add: new=ffff88905f746f48, prev=ffff88905f746f48, \
next=ffff88905f746e40.
WARNING: CPU: 18 PID: 5462 at lib/list_debug.c:35 \
__list_add_valid_or_report+0xf3/0x130
CPU: 18 UID: 0 PID: 5462 Comm: ping Kdump: loaded Not tainted 6.13.0-rc7+
RIP: 0010:__list_add_valid_or_report+0xf3/0x130
Call Trace:
? __warn+0xcd/0x250
? __list_add_valid_or_report+0xf3/0x130
enqueue_to_backlog+0x923/0x1070
netif_rx_internal+0x92/0x2b0
__netif_rx+0x15/0x170
loopback_xmit+0x2ef/0x450
dev_hard_start_xmit+0x103/0x490
__dev_queue_xmit+0xeac/0x1950
ip_finish_output2+0x6cc/0x1620
ip_output+0x161/0x270
ip_push_pending_frames+0x155/0x1a0
raw_sendmsg+0xe13/0x1550
__sys_sendto+0x3bf/0x4e0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x5b/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The reproduction command is as follows:
sysctl -w net.core.dev_weight=0
ping 127.0.0.1
This is because when the napi's weight is set to 0, process_backlog() may
return 0 and clear the NAPI_STATE_SCHED bit of napi->state, causing this
napi to be re-polled in net_rx_action() until __do_softirq() times out.
Since the NAPI_STATE_SCHED bit has been cleared, napi_schedule_rps() can
be retriggered in enqueue_to_backlog(), causing this issue.
Making the napi's weight always non-zero solves this problem.
Triggering this issue requires system-wide admin (setting is
not namespaced). |
| In the Linux kernel, the following vulnerability has been resolved:
block: fix queue freeze vs limits lock order in sysfs store methods
queue_attr_store() always freezes a device queue before calling the
attribute store operation. For attributes that control queue limits, the
store operation will also lock the queue limits with a call to
queue_limits_start_update(). However, some drivers (e.g. SCSI sd) may
need to issue commands to a device to obtain limit values from the
hardware with the queue limits locked. This creates a potential ABBA
deadlock situation if a user attempts to modify a limit (thus freezing
the device queue) while the device driver starts a revalidation of the
device queue limits.
Avoid such deadlock by not freezing the queue before calling the
->store_limit() method in struct queue_sysfs_entry and instead use the
queue_limits_commit_update_frozen helper to freeze the queue after taking
the limits lock.
This also removes taking the sysfs lock for the store_limit method as
it doesn't protect anything here, but creates even more nesting.
Hopefully it will go away from the actual sysfs methods entirely soon.
(commit log adapted from a similar patch from Damien Le Moal) |
| In the Linux kernel, the following vulnerability has been resolved:
net: xdp: Disallow attaching device-bound programs in generic mode
Device-bound programs are used to support RX metadata kfuncs. These
kfuncs are driver-specific and rely on the driver context to read the
metadata. This means they can't work in generic XDP mode. However, there
is no check to disallow such programs from being attached in generic
mode, in which case the metadata kfuncs will be called in an invalid
context, leading to crashes.
Fix this by adding a check to disallow attaching device-bound programs
in generic mode. |
| In the Linux kernel, the following vulnerability has been resolved:
driver core: class: Fix wild pointer dereferences in API class_dev_iter_next()
There are a potential wild pointer dereferences issue regarding APIs
class_dev_iter_(init|next|exit)(), as explained by below typical usage:
// All members of @iter are wild pointers.
struct class_dev_iter iter;
// class_dev_iter_init(@iter, @class, ...) checks parameter @class for
// potential class_to_subsys() error, and it returns void type and does
// not initialize its output parameter @iter, so caller can not detect
// the error and continues to invoke class_dev_iter_next(@iter) even if
// @iter still contains wild pointers.
class_dev_iter_init(&iter, ...);
// Dereference these wild pointers in @iter here once suffer the error.
while (dev = class_dev_iter_next(&iter)) { ... };
// Also dereference these wild pointers here.
class_dev_iter_exit(&iter);
Actually, all callers of these APIs have such usage pattern in kernel tree.
Fix by:
- Initialize output parameter @iter by memset() in class_dev_iter_init()
and give callers prompt by pr_crit() for the error.
- Check if @iter is valid in class_dev_iter_next(). |
| In the Linux kernel, the following vulnerability has been resolved:
timers/migration: Fix off-by-one root mis-connection
Before attaching a new root to the old root, the children counter of the
new root is checked to verify that only the upcoming CPU's top group have
been connected to it. However since the recently added commit b729cc1ec21a
("timers/migration: Fix another race between hotplug and idle entry/exit")
this check is not valid anymore because the old root is pre-accounted
as a child to the new root. Therefore after connecting the upcoming
CPU's top group to the new root, the children count to be expected must
be 2 and not 1 anymore.
This omission results in the old root to not be connected to the new
root. Then eventually the system may run with more than one top level,
which defeats the purpose of a single idle migrator.
Also the old root is pre-accounted but not connected upon the new root
creation. But it can be connected to the new root later on. Therefore
the old root may be accounted twice to the new root. The propagation of
such overcommit can end up creating a double final top-level root with a
groupmask incorrectly initialized. Although harmless given that the final
top level roots will never have a parent to walk up to, this oddity
opportunistically reported the core issue:
WARNING: CPU: 8 PID: 0 at kernel/time/timer_migration.c:543 tmigr_requires_handle_remote
CPU: 8 UID: 0 PID: 0 Comm: swapper/8
RIP: 0010:tmigr_requires_handle_remote
Call Trace:
<IRQ>
? tmigr_requires_handle_remote
? hrtimer_run_queues
update_process_times
tick_periodic
tick_handle_periodic
__sysvec_apic_timer_interrupt
sysvec_apic_timer_interrupt
</IRQ>
Fix the problem by taking the old root into account in the children count
of the new root so the connection is not omitted.
Also warn when more than one top level group exists to better detect
similar issues in the future. |
| In the Linux kernel, the following vulnerability has been resolved:
hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYING
hrtimers are migrated away from the dying CPU to any online target at
the CPUHP_AP_HRTIMERS_DYING stage in order not to delay bandwidth timers
handling tasks involved in the CPU hotplug forward progress.
However wakeups can still be performed by the outgoing CPU after
CPUHP_AP_HRTIMERS_DYING. Those can result again in bandwidth timers being
armed. Depending on several considerations (crystal ball power management
based election, earliest timer already enqueued, timer migration enabled or
not), the target may eventually be the current CPU even if offline. If that
happens, the timer is eventually ignored.
The most notable example is RCU which had to deal with each and every of
those wake-ups by deferring them to an online CPU, along with related
workarounds:
_ e787644caf76 (rcu: Defer RCU kthreads wakeup when CPU is dying)
_ 9139f93209d1 (rcu/nocb: Fix RT throttling hrtimer armed from offline CPU)
_ f7345ccc62a4 (rcu/nocb: Fix rcuog wake-up from offline softirq)
The problem isn't confined to RCU though as the stop machine kthread
(which runs CPUHP_AP_HRTIMERS_DYING) reports its completion at the end
of its work through cpu_stop_signal_done() and performs a wake up that
eventually arms the deadline server timer:
WARNING: CPU: 94 PID: 588 at kernel/time/hrtimer.c:1086 hrtimer_start_range_ns+0x289/0x2d0
CPU: 94 UID: 0 PID: 588 Comm: migration/94 Not tainted
Stopper: multi_cpu_stop+0x0/0x120 <- stop_machine_cpuslocked+0x66/0xc0
RIP: 0010:hrtimer_start_range_ns+0x289/0x2d0
Call Trace:
<TASK>
start_dl_timer
enqueue_dl_entity
dl_server_start
enqueue_task_fair
enqueue_task
ttwu_do_activate
try_to_wake_up
complete
cpu_stopper_thread
Instead of providing yet another bandaid to work around the situation, fix
it in the hrtimers infrastructure instead: always migrate away a timer to
an online target whenever it is enqueued from an offline CPU.
This will also allow to revert all the above RCU disgraceful hacks. |
| In the Linux kernel, the following vulnerability has been resolved:
mm/compaction: fix UBSAN shift-out-of-bounds warning
syzkaller reported a UBSAN shift-out-of-bounds warning of (1UL << order)
in isolate_freepages_block(). The bogus compound_order can be any value
because it is union with flags. Add back the MAX_PAGE_ORDER check to fix
the warning. |
| In the Linux kernel, the following vulnerability has been resolved:
block: mark GFP_NOIO around sysfs ->store()
sysfs ->store is called with queue freezed, meantime we have several
->store() callbacks(update_nr_requests, wbt, scheduler) to allocate
memory with GFP_KERNEL which may run into direct reclaim code path,
then potential deadlock can be caused.
Fix the issue by marking NOIO around sysfs ->store() |
| In the Linux kernel, the following vulnerability has been resolved:
Revert "drm/amd/display: Use HW lock mgr for PSR1"
This reverts commit
a2b5a9956269 ("drm/amd/display: Use HW lock mgr for PSR1")
Because it may cause system hang while connect with two edp panel. |
| In the Linux kernel, the following vulnerability has been resolved:
fbdev: omap: use threaded IRQ for LCD DMA
When using touchscreen and framebuffer, Nokia 770 crashes easily with:
BUG: scheduling while atomic: irq/144-ads7846/82/0x00010000
Modules linked in: usb_f_ecm g_ether usb_f_rndis u_ether libcomposite configfs omap_udc ohci_omap ohci_hcd
CPU: 0 UID: 0 PID: 82 Comm: irq/144-ads7846 Not tainted 6.12.7-770 #2
Hardware name: Nokia 770
Call trace:
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x54/0x5c
dump_stack_lvl from __schedule_bug+0x50/0x70
__schedule_bug from __schedule+0x4d4/0x5bc
__schedule from schedule+0x34/0xa0
schedule from schedule_preempt_disabled+0xc/0x10
schedule_preempt_disabled from __mutex_lock.constprop.0+0x218/0x3b4
__mutex_lock.constprop.0 from clk_prepare_lock+0x38/0xe4
clk_prepare_lock from clk_set_rate+0x18/0x154
clk_set_rate from sossi_read_data+0x4c/0x168
sossi_read_data from hwa742_read_reg+0x5c/0x8c
hwa742_read_reg from send_frame_handler+0xfc/0x300
send_frame_handler from process_pending_requests+0x74/0xd0
process_pending_requests from lcd_dma_irq_handler+0x50/0x74
lcd_dma_irq_handler from __handle_irq_event_percpu+0x44/0x130
__handle_irq_event_percpu from handle_irq_event+0x28/0x68
handle_irq_event from handle_level_irq+0x9c/0x170
handle_level_irq from generic_handle_domain_irq+0x2c/0x3c
generic_handle_domain_irq from omap1_handle_irq+0x40/0x8c
omap1_handle_irq from generic_handle_arch_irq+0x28/0x3c
generic_handle_arch_irq from call_with_stack+0x1c/0x24
call_with_stack from __irq_svc+0x94/0xa8
Exception stack(0xc5255da0 to 0xc5255de8)
5da0: 00000001 c22fc620 00000000 00000000 c08384a8 c106fc00 00000000 c240c248
5dc0: c113a600 c3f6ec30 00000001 00000000 c22fc620 c5255df0 c22fc620 c0279a94
5de0: 60000013 ffffffff
__irq_svc from clk_prepare_lock+0x4c/0xe4
clk_prepare_lock from clk_get_rate+0x10/0x74
clk_get_rate from uwire_setup_transfer+0x40/0x180
uwire_setup_transfer from spi_bitbang_transfer_one+0x2c/0x9c
spi_bitbang_transfer_one from spi_transfer_one_message+0x2d0/0x664
spi_transfer_one_message from __spi_pump_transfer_message+0x29c/0x498
__spi_pump_transfer_message from __spi_sync+0x1f8/0x2e8
__spi_sync from spi_sync+0x24/0x40
spi_sync from ads7846_halfd_read_state+0x5c/0x1c0
ads7846_halfd_read_state from ads7846_irq+0x58/0x348
ads7846_irq from irq_thread_fn+0x1c/0x78
irq_thread_fn from irq_thread+0x120/0x228
irq_thread from kthread+0xc8/0xe8
kthread from ret_from_fork+0x14/0x28
As a quick fix, switch to a threaded IRQ which provides a stable system. |
| In the Linux kernel, the following vulnerability has been resolved:
ptp: vmclock: Set driver data before its usage
If vmclock_ptp_register() fails during probing, vmclock_remove() is
called to clean up the ptp clock and misc device.
It uses dev_get_drvdata() to access the vmclock state.
However the driver data is not yet set at this point.
Assign the driver data earlier. |
| In the Linux kernel, the following vulnerability has been resolved:
batman-adv: Drop unmanaged ELP metric worker
The ELP worker needs to calculate new metric values for all neighbors
"reachable" over an interface. Some of the used metric sources require
locks which might need to sleep. This sleep is incompatible with the RCU
list iterator used for the recorded neighbors. The initial approach to work
around of this problem was to queue another work item per neighbor and then
run this in a new context.
Even when this solved the RCU vs might_sleep() conflict, it has a major
problems: Nothing was stopping the work item in case it is not needed
anymore - for example because one of the related interfaces was removed or
the batman-adv module was unloaded - resulting in potential invalid memory
accesses.
Directly canceling the metric worker also has various problems:
* cancel_work_sync for a to-be-deactivated interface is called with
rtnl_lock held. But the code in the ELP metric worker also tries to use
rtnl_lock() - which will never return in this case. This also means that
cancel_work_sync would never return because it is waiting for the worker
to finish.
* iterating over the neighbor list for the to-be-deactivated interface is
currently done using the RCU specific methods. Which means that it is
possible to miss items when iterating over it without the associated
spinlock - a behaviour which is acceptable for a periodic metric check
but not for a cleanup routine (which must "stop" all still running
workers)
The better approch is to get rid of the per interface neighbor metric
worker and handle everything in the interface worker. The original problems
are solved by:
* creating a list of neighbors which require new metric information inside
the RCU protected context, gathering the metric according to the new list
outside the RCU protected context
* only use rcu_trylock inside metric gathering code to avoid a deadlock
when the cancel_delayed_work_sync is called in the interface removal code
(which is called with the rtnl_lock held) |
| In the Linux kernel, the following vulnerability has been resolved:
ipmi: ipmb: Add check devm_kasprintf() returned value
devm_kasprintf() can return a NULL pointer on failure but this
returned value is not checked. |
| In the Linux kernel, the following vulnerability has been resolved:
remoteproc: core: Fix ida_free call while not allocated
In the rproc_alloc() function, on error, put_device(&rproc->dev) is
called, leading to the call of the rproc_type_release() function.
An error can occurs before ida_alloc is called.
In such case in rproc_type_release(), the condition (rproc->index >= 0) is
true as rproc->index has been initialized to 0.
ida_free() is called reporting a warning:
[ 4.181906] WARNING: CPU: 1 PID: 24 at lib/idr.c:525 ida_free+0x100/0x164
[ 4.186378] stm32-display-dsi 5a000000.dsi: Fixed dependency cycle(s) with /soc/dsi@5a000000/panel@0
[ 4.188854] ida_free called for id=0 which is not allocated.
[ 4.198256] mipi-dsi 5a000000.dsi.0: Fixed dependency cycle(s) with /soc/dsi@5a000000
[ 4.203556] Modules linked in: panel_orisetech_otm8009a dw_mipi_dsi_stm(+) gpu_sched dw_mipi_dsi stm32_rproc stm32_crc32 stm32_ipcc(+) optee(+)
[ 4.224307] CPU: 1 UID: 0 PID: 24 Comm: kworker/u10:0 Not tainted 6.12.0 #442
[ 4.231481] Hardware name: STM32 (Device Tree Support)
[ 4.236627] Workqueue: events_unbound deferred_probe_work_func
[ 4.242504] Call trace:
[ 4.242522] unwind_backtrace from show_stack+0x10/0x14
[ 4.250218] show_stack from dump_stack_lvl+0x50/0x64
[ 4.255274] dump_stack_lvl from __warn+0x80/0x12c
[ 4.260134] __warn from warn_slowpath_fmt+0x114/0x188
[ 4.265199] warn_slowpath_fmt from ida_free+0x100/0x164
[ 4.270565] ida_free from rproc_type_release+0x38/0x60
[ 4.275832] rproc_type_release from device_release+0x30/0xa0
[ 4.281601] device_release from kobject_put+0xc4/0x294
[ 4.286762] kobject_put from rproc_alloc.part.0+0x208/0x28c
[ 4.292430] rproc_alloc.part.0 from devm_rproc_alloc+0x80/0xc4
[ 4.298393] devm_rproc_alloc from stm32_rproc_probe+0xd0/0x844 [stm32_rproc]
[ 4.305575] stm32_rproc_probe [stm32_rproc] from platform_probe+0x5c/0xbc
Calling ida_alloc earlier in rproc_alloc ensures that the rproc->index is
properly set. |
| In the Linux kernel, the following vulnerability has been resolved:
rxrpc: Fix handling of received connection abort
Fix the handling of a connection abort that we've received. Though the
abort is at the connection level, it needs propagating to the calls on that
connection. Whilst the propagation bit is performed, the calls aren't then
woken up to go and process their termination, and as no further input is
forthcoming, they just hang.
Also add some tracing for the logging of connection aborts. |
| In the Linux kernel, the following vulnerability has been resolved:
idpf: convert workqueues to unbound
When a workqueue is created with `WQ_UNBOUND`, its work items are
served by special worker-pools, whose host workers are not bound to
any specific CPU. In the default configuration (i.e. when
`queue_delayed_work` and friends do not specify which CPU to run the
work item on), `WQ_UNBOUND` allows the work item to be executed on any
CPU in the same node of the CPU it was enqueued on. While this
solution potentially sacrifices locality, it avoids contention with
other processes that might dominate the CPU time of the processor the
work item was scheduled on.
This is not just a theoretical problem: in a particular scenario
misconfigured process was hogging most of the time from CPU0, leaving
less than 0.5% of its CPU time to the kworker. The IDPF workqueues
that were using the kworker on CPU0 suffered large completion delays
as a result, causing performance degradation, timeouts and eventual
system crash.
* I have also run a manual test to gauge the performance
improvement. The test consists of an antagonist process
(`./stress --cpu 2`) consuming as much of CPU 0 as possible. This
process is run under `taskset 01` to bind it to CPU0, and its
priority is changed with `chrt -pQ 9900 10000 ${pid}` and
`renice -n -20 ${pid}` after start.
Then, the IDPF driver is forced to prefer CPU0 by editing all calls
to `queue_delayed_work`, `mod_delayed_work`, etc... to use CPU 0.
Finally, `ktraces` for the workqueue events are collected.
Without the current patch, the antagonist process can force
arbitrary delays between `workqueue_queue_work` and
`workqueue_execute_start`, that in my tests were as high as
`30ms`. With the current patch applied, the workqueue can be
migrated to another unloaded CPU in the same node, and, keeping
everything else equal, the maximum delay I could see was `6us`. |