Search

Search Results (311941 CVEs found)

CVE Vendors Products Updated CVSS v3.1
CVE-2025-39798 1 Linux 1 Linux Kernel 2025-09-29 N/A
In the Linux kernel, the following vulnerability has been resolved: NFS: Fix the setting of capabilities when automounting a new filesystem Capabilities cannot be inherited when we cross into a new filesystem. They need to be reset to the minimal defaults, and then probed for again.
CVE-2025-39797 1 Linux 1 Linux Kernel 2025-09-29 N/A
In the Linux kernel, the following vulnerability has been resolved: xfrm: Duplicate SPI Handling The issue originates when Strongswan initiates an XFRM_MSG_ALLOCSPI Netlink message, which triggers the kernel function xfrm_alloc_spi(). This function is expected to ensure uniqueness of the Security Parameter Index (SPI) for inbound Security Associations (SAs). However, it can return success even when the requested SPI is already in use, leading to duplicate SPIs assigned to multiple inbound SAs, differentiated only by their destination addresses. This behavior causes inconsistencies during SPI lookups for inbound packets. Since the lookup may return an arbitrary SA among those with the same SPI, packet processing can fail, resulting in packet drops. According to RFC 4301 section 4.4.2 , for inbound processing a unicast SA is uniquely identified by the SPI and optionally protocol. Reproducing the Issue Reliably: To consistently reproduce the problem, restrict the available SPI range in charon.conf : spi_min = 0x10000000 spi_max = 0x10000002 This limits the system to only 2 usable SPI values. Next, create more than 2 Child SA. each using unique pair of src/dst address. As soon as the 3rd Child SA is initiated, it will be assigned a duplicate SPI, since the SPI pool is already exhausted. With a narrow SPI range, the issue is consistently reproducible. With a broader/default range, it becomes rare and unpredictable. Current implementation: xfrm_spi_hash() lookup function computes hash using daddr, proto, and family. So if two SAs have the same SPI but different destination addresses, then they will: a. Hash into different buckets b. Be stored in different linked lists (byspi + h) c. Not be seen in the same hlist_for_each_entry_rcu() iteration. As a result, the lookup will result in NULL and kernel allows that Duplicate SPI Proposed Change: xfrm_state_lookup_spi_proto() does a truly global search - across all states, regardless of hash bucket and matches SPI and proto.
CVE-2025-39796 1 Linux 1 Linux Kernel 2025-09-29 N/A
In the Linux kernel, the following vulnerability has been resolved: net: lapbether: ignore ops-locked netdevs Syzkaller managed to trigger lock dependency in xsk_notify via register_netdevice. As discussed in [0], using register_netdevice in the notifiers is problematic so skip adding lapbeth for ops-locked devices. xsk_notifier+0xa4/0x280 net/xdp/xsk.c:1645 notifier_call_chain+0xbc/0x410 kernel/notifier.c:85 call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230 call_netdevice_notifiers_extack net/core/dev.c:2268 [inline] call_netdevice_notifiers net/core/dev.c:2282 [inline] unregister_netdevice_many_notify+0xf9d/0x2700 net/core/dev.c:12077 unregister_netdevice_many net/core/dev.c:12140 [inline] unregister_netdevice_queue+0x305/0x3f0 net/core/dev.c:11984 register_netdevice+0x18f1/0x2270 net/core/dev.c:11149 lapbeth_new_device drivers/net/wan/lapbether.c:420 [inline] lapbeth_device_event+0x5b1/0xbe0 drivers/net/wan/lapbether.c:462 notifier_call_chain+0xbc/0x410 kernel/notifier.c:85 call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230 call_netdevice_notifiers_extack net/core/dev.c:2268 [inline] call_netdevice_notifiers net/core/dev.c:2282 [inline] __dev_notify_flags+0x12c/0x2e0 net/core/dev.c:9497 netif_change_flags+0x108/0x160 net/core/dev.c:9526 dev_change_flags+0xba/0x250 net/core/dev_api.c:68 devinet_ioctl+0x11d5/0x1f50 net/ipv4/devinet.c:1200 inet_ioctl+0x3a7/0x3f0 net/ipv4/af_inet.c:1001 0: https://lore.kernel.org/netdev/20250625140357.6203d0af@kernel.org/
CVE-2025-39795 1 Linux 1 Linux Kernel 2025-09-29 N/A
In the Linux kernel, the following vulnerability has been resolved: block: avoid possible overflow for chunk_sectors check in blk_stack_limits() In blk_stack_limits(), we check that the t->chunk_sectors value is a multiple of the t->physical_block_size value. However, by finding the chunk_sectors value in bytes, we may overflow the unsigned int which holds chunk_sectors, so change the check to be based on sectors.
CVE-2025-39794 1 Linux 1 Linux Kernel 2025-09-29 N/A
In the Linux kernel, the following vulnerability has been resolved: ARM: tegra: Use I/O memcpy to write to IRAM Kasan crashes the kernel trying to check boundaries when using the normal memcpy.
CVE-2025-39793 1 Linux 1 Linux Kernel 2025-09-29 N/A
In the Linux kernel, the following vulnerability has been resolved: io_uring/memmap: cast nr_pages to size_t before shifting If the allocated size exceeds UINT_MAX, then it's necessary to cast the mr->nr_pages value to size_t to prevent it from overflowing. In practice this isn't much of a concern as the required memory size will have been validated upfront, and accounted to the user. And > 4GB sizes will be necessary to make the lack of a cast a problem, which greatly exceeds normal user locked_vm settings that are generally in the kb to mb range. However, if root is used, then accounting isn't done, and then it's possible to hit this issue.
CVE-2025-39792 1 Linux 1 Linux Kernel 2025-09-29 N/A
In the Linux kernel, the following vulnerability has been resolved: dm: Always split write BIOs to zoned device limits Any zoned DM target that requires zone append emulation will use the block layer zone write plugging. In such case, DM target drivers must not split BIOs using dm_accept_partial_bio() as doing so can potentially lead to deadlocks with queue freeze operations. Regular write operations used to emulate zone append operations also cannot be split by the target driver as that would result in an invalid writen sector value return using the BIO sector. In order for zoned DM target drivers to avoid such incorrect BIO splitting, we must ensure that large BIOs are split before being passed to the map() function of the target, thus guaranteeing that the limits for the mapped device are not exceeded. dm-crypt and dm-flakey are the only target drivers supporting zoned devices and using dm_accept_partial_bio(). In the case of dm-crypt, this function is used to split BIOs to the internal max_write_size limit (which will be suppressed in a different patch). However, since crypt_alloc_buffer() uses a bioset allowing only up to BIO_MAX_VECS (256) vectors in a BIO. The dm-crypt device max_segments limit, which is not set and so default to BLK_MAX_SEGMENTS (128), must thus be respected and write BIOs split accordingly. In the case of dm-flakey, since zone append emulation is not required, the block layer zone write plugging is not used and no splitting of BIOs required. Modify the function dm_zone_bio_needs_split() to use the block layer helper function bio_needs_zone_write_plugging() to force a call to bio_split_to_limits() in dm_split_and_process_bio(). This allows DM target drivers to avoid using dm_accept_partial_bio() for write operations on zoned DM devices.
CVE-2025-39791 1 Linux 1 Linux Kernel 2025-09-29 5.5 Medium
In the Linux kernel, the following vulnerability has been resolved: dm: dm-crypt: Do not partially accept write BIOs with zoned targets Read and write operations issued to a dm-crypt target may be split according to the dm-crypt internal limits defined by the max_read_size and max_write_size module parameters (default is 128 KB). The intent is to improve processing time of large BIOs by splitting them into smaller operations that can be parallelized on different CPUs. For zoned dm-crypt targets, this BIO splitting is still done but without the parallel execution to ensure that the issuing order of write operations to the underlying devices remains sequential. However, the splitting itself causes other problems: 1) Since dm-crypt relies on the block layer zone write plugging to handle zone append emulation using regular write operations, the reminder of a split write BIO will always be plugged into the target zone write plugged. Once the on-going write BIO finishes, this reminder BIO is unplugged and issued from the zone write plug work. If this reminder BIO itself needs to be split, the reminder will be re-issued and plugged again, but that causes a call to a blk_queue_enter(), which may block if a queue freeze operation was initiated. This results in a deadlock as DM submission still holds BIOs that the queue freeze side is waiting for. 2) dm-crypt relies on the emulation done by the block layer using regular write operations for processing zone append operations. This still requires to properly return the written sector as the BIO sector of the original BIO. However, this can be done correctly only and only if there is a single clone BIO used for processing the original zone append operation issued by the user. If the size of a zone append operation is larger than dm-crypt max_write_size, then the orginal BIO will be split and processed as a chain of regular write operations. Such chaining result in an incorrect written sector being returned to the zone append issuer using the original BIO sector. This in turn results in file system data corruptions using xfs or btrfs. Fix this by modifying get_max_request_size() to always return the size of the BIO to avoid it being split with dm_accpet_partial_bio() in crypt_map(). get_max_request_size() is renamed to get_max_request_sectors() to clarify the unit of the value returned and its interface is changed to take a struct dm_target pointer and a pointer to the struct bio being processed. In addition to this change, to ensure that crypt_alloc_buffer() works correctly, set the dm-crypt device max_hw_sectors limit to be at most BIO_MAX_VECS << PAGE_SECTORS_SHIFT (1 MB with a 4KB page architecture). This forces DM core to split write BIOs before passing them to crypt_map(), and thus guaranteeing that dm-crypt can always accept an entire write BIO without needing to split it. This change does not have any effect on the read path of dm-crypt. Read operations can still be split and the BIO fragments processed in parallel. There is also no impact on the performance of the write path given that all zone write BIOs were already processed inline instead of in parallel. This change also does not affect in any way regular dm-crypt block devices.
CVE-2025-39790 1 Linux 1 Linux Kernel 2025-09-29 7.0 High
In the Linux kernel, the following vulnerability has been resolved: bus: mhi: host: Detect events pointing to unexpected TREs When a remote device sends a completion event to the host, it contains a pointer to the consumed TRE. The host uses this pointer to process all of the TREs between it and the host's local copy of the ring's read pointer. This works when processing completion for chained transactions, but can lead to nasty results if the device sends an event for a single-element transaction with a read pointer that is multiple elements ahead of the host's read pointer. For instance, if the host accesses an event ring while the device is updating it, the pointer inside of the event might still point to an old TRE. If the host uses the channel's xfer_cb() to directly free the buffer pointed to by the TRE, the buffer will be double-freed. This behavior was observed on an ep that used upstream EP stack without 'commit 6f18d174b73d ("bus: mhi: ep: Update read pointer only after buffer is written")'. Where the device updated the events ring pointer before updating the event contents, so it left a window where the host was able to access the stale data the event pointed to, before the device had the chance to update them. The usual pattern was that the host received an event pointing to a TRE that is not immediately after the last processed one, so it got treated as if it was a chained transaction, processing all of the TREs in between the two read pointers. This commit aims to harden the host by ensuring transactions where the event points to a TRE that isn't local_rp + 1 are chained. [mani: added stable tag and reworded commit message]
CVE-2025-39789 1 Linux 1 Linux Kernel 2025-09-29 5.5 Medium
In the Linux kernel, the following vulnerability has been resolved: crypto: x86/aegis - Add missing error checks The skcipher_walk functions can allocate memory and can fail, so checking for errors is necessary.
CVE-2025-39788 1 Linux 1 Linux Kernel 2025-09-29 N/A
In the Linux kernel, the following vulnerability has been resolved: scsi: ufs: exynos: Fix programming of HCI_UTRL_NEXUS_TYPE On Google gs101, the number of UTP transfer request slots (nutrs) is 32, and in this case the driver ends up programming the UTRL_NEXUS_TYPE incorrectly as 0. This is because the left hand side of the shift is 1, which is of type int, i.e. 31 bits wide. Shifting by more than that width results in undefined behaviour. Fix this by switching to the BIT() macro, which applies correct type casting as required. This ensures the correct value is written to UTRL_NEXUS_TYPE (0xffffffff on gs101), and it also fixes a UBSAN shift warning: UBSAN: shift-out-of-bounds in drivers/ufs/host/ufs-exynos.c:1113:21 shift exponent 32 is too large for 32-bit type 'int' For consistency, apply the same change to the nutmrs / UTMRL_NEXUS_TYPE write.
CVE-2025-39787 1 Linux 1 Linux Kernel 2025-09-29 5.5 Medium
In the Linux kernel, the following vulnerability has been resolved: soc: qcom: mdt_loader: Ensure we don't read past the ELF header When the MDT loader is used in remoteproc, the ELF header is sanitized beforehand, but that's not necessary the case for other clients. Validate the size of the firmware buffer to ensure that we don't read past the end as we iterate over the header. e_phentsize and e_shentsize are validated as well, to ensure that the assumptions about step size in the traversal are valid.
CVE-2025-39786 1 Linux 1 Linux Kernel 2025-09-29 5.5 Medium
In the Linux kernel, the following vulnerability has been resolved: iio: adc: ad7173: fix channels index for syscalib_mode Fix the index used to look up the channel when accessing the syscalib_mode attribute. The address field is a 0-based index (same as scan_index) that it used to access the channel in the ad7173_channels array throughout the driver. The channels field, on the other hand, may not match the address field depending on the channel configuration specified in the device tree and could result in an out-of-bounds access.
CVE-2025-39785 1 Linux 1 Linux Kernel 2025-09-29 5.5 Medium
In the Linux kernel, the following vulnerability has been resolved: drm/hisilicon/hibmc: fix irq_request()'s irq name variable is local The local variable is passed in request_irq (), and there will be use after free problem, which will make request_irq failed. Using the global irq name instead of it to fix.
CVE-2025-39784 1 Linux 1 Linux Kernel 2025-09-29 5.5 Medium
In the Linux kernel, the following vulnerability has been resolved: PCI: Fix link speed calculation on retrain failure When pcie_failed_link_retrain() fails to retrain, it tries to revert to the previous link speed. However it calculates that speed from the Link Control 2 register without masking out non-speed bits first. PCIE_LNKCTL2_TLS2SPEED() converts such incorrect values to PCI_SPEED_UNKNOWN (0xff), which in turn causes a WARN splat in pcie_set_target_speed(): pci 0000:00:01.1: [1022:14ed] type 01 class 0x060400 PCIe Root Port pci 0000:00:01.1: broken device, retraining non-functional downstream link at 2.5GT/s pci 0000:00:01.1: retraining failed WARNING: CPU: 1 PID: 1 at drivers/pci/pcie/bwctrl.c:168 pcie_set_target_speed RDX: 0000000000000001 RSI: 00000000000000ff RDI: ffff9acd82efa000 pcie_failed_link_retrain pci_device_add pci_scan_single_device Mask out the non-speed bits in PCIE_LNKCTL2_TLS2SPEED() and PCIE_LNKCAP_SLS2SPEED() so they don't incorrectly return PCI_SPEED_UNKNOWN. [bhelgaas: commit log, add details from https://lore.kernel.org/r/1c92ef6bcb314ee6977839b46b393282e4f52e74.1750684771.git.lukas@wunner.de]
CVE-2025-39783 1 Linux 1 Linux Kernel 2025-09-29 7.0 High
In the Linux kernel, the following vulnerability has been resolved: PCI: endpoint: Fix configfs group list head handling Doing a list_del() on the epf_group field of struct pci_epf_driver in pci_epf_remove_cfs() is not correct as this field is a list head, not a list entry. This list_del() call triggers a KASAN warning when an endpoint function driver which has a configfs attribute group is torn down: ================================================================== BUG: KASAN: slab-use-after-free in pci_epf_remove_cfs+0x17c/0x198 Write of size 8 at addr ffff00010f4a0d80 by task rmmod/319 CPU: 3 UID: 0 PID: 319 Comm: rmmod Not tainted 6.16.0-rc2 #1 NONE Hardware name: Radxa ROCK 5B (DT) Call trace: show_stack+0x2c/0x84 (C) dump_stack_lvl+0x70/0x98 print_report+0x17c/0x538 kasan_report+0xb8/0x190 __asan_report_store8_noabort+0x20/0x2c pci_epf_remove_cfs+0x17c/0x198 pci_epf_unregister_driver+0x18/0x30 nvmet_pci_epf_cleanup_module+0x24/0x30 [nvmet_pci_epf] __arm64_sys_delete_module+0x264/0x424 invoke_syscall+0x70/0x260 el0_svc_common.constprop.0+0xac/0x230 do_el0_svc+0x40/0x58 el0_svc+0x48/0xdc el0t_64_sync_handler+0x10c/0x138 el0t_64_sync+0x198/0x19c ... Remove this incorrect list_del() call from pci_epf_remove_cfs().
CVE-2025-39782 1 Linux 1 Linux Kernel 2025-09-29 5.5 Medium
In the Linux kernel, the following vulnerability has been resolved: jbd2: prevent softlockup in jbd2_log_do_checkpoint() Both jbd2_log_do_checkpoint() and jbd2_journal_shrink_checkpoint_list() periodically release j_list_lock after processing a batch of buffers to avoid long hold times on the j_list_lock. However, since both functions contend for j_list_lock, the combined time spent waiting and processing can be significant. jbd2_journal_shrink_checkpoint_list() explicitly calls cond_resched() when need_resched() is true to avoid softlockups during prolonged operations. But jbd2_log_do_checkpoint() only exits its loop when need_resched() is true, relying on potentially sleeping functions like __flush_batch() or wait_on_buffer() to trigger rescheduling. If those functions do not sleep, the kernel may hit a softlockup. watchdog: BUG: soft lockup - CPU#3 stuck for 156s! [kworker/u129:2:373] CPU: 3 PID: 373 Comm: kworker/u129:2 Kdump: loaded Not tainted 6.6.0+ #10 Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.27 06/13/2017 Workqueue: writeback wb_workfn (flush-7:2) pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : native_queued_spin_lock_slowpath+0x358/0x418 lr : jbd2_log_do_checkpoint+0x31c/0x438 [jbd2] Call trace: native_queued_spin_lock_slowpath+0x358/0x418 jbd2_log_do_checkpoint+0x31c/0x438 [jbd2] __jbd2_log_wait_for_space+0xfc/0x2f8 [jbd2] add_transaction_credits+0x3bc/0x418 [jbd2] start_this_handle+0xf8/0x560 [jbd2] jbd2__journal_start+0x118/0x228 [jbd2] __ext4_journal_start_sb+0x110/0x188 [ext4] ext4_do_writepages+0x3dc/0x740 [ext4] ext4_writepages+0xa4/0x190 [ext4] do_writepages+0x94/0x228 __writeback_single_inode+0x48/0x318 writeback_sb_inodes+0x204/0x590 __writeback_inodes_wb+0x54/0xf8 wb_writeback+0x2cc/0x3d8 wb_do_writeback+0x2e0/0x2f8 wb_workfn+0x80/0x2a8 process_one_work+0x178/0x3e8 worker_thread+0x234/0x3b8 kthread+0xf0/0x108 ret_from_fork+0x10/0x20 So explicitly call cond_resched() in jbd2_log_do_checkpoint() to avoid softlockup.
CVE-2025-39781 1 Linux 1 Linux Kernel 2025-09-29 5.5 Medium
In the Linux kernel, the following vulnerability has been resolved: parisc: Drop WARN_ON_ONCE() from flush_cache_vmap I have observed warning to occassionally trigger.
CVE-2025-39780 1 Linux 1 Linux Kernel 2025-09-29 5.5 Medium
In the Linux kernel, the following vulnerability has been resolved: sched/ext: Fix invalid task state transitions on class switch When enabling a sched_ext scheduler, we may trigger invalid task state transitions, resulting in warnings like the following (which can be easily reproduced by running the hotplug selftest in a loop): sched_ext: Invalid task state transition 0 -> 3 for fish[770] WARNING: CPU: 18 PID: 787 at kernel/sched/ext.c:3862 scx_set_task_state+0x7c/0xc0 ... RIP: 0010:scx_set_task_state+0x7c/0xc0 ... Call Trace: <TASK> scx_enable_task+0x11f/0x2e0 switching_to_scx+0x24/0x110 scx_enable.isra.0+0xd14/0x13d0 bpf_struct_ops_link_create+0x136/0x1a0 __sys_bpf+0x1edd/0x2c30 __x64_sys_bpf+0x21/0x30 do_syscall_64+0xbb/0x370 entry_SYSCALL_64_after_hwframe+0x77/0x7f This happens because we skip initialization for tasks that are already dead (with their usage counter set to zero), but we don't exclude them during the scheduling class transition phase. Fix this by also skipping dead tasks during class swiching, preventing invalid task state transitions.
CVE-2025-39779 1 Linux 1 Linux Kernel 2025-09-29 5.5 Medium
In the Linux kernel, the following vulnerability has been resolved: btrfs: subpage: keep TOWRITE tag until folio is cleaned btrfs_subpage_set_writeback() calls folio_start_writeback() the first time a folio is written back, and it also clears the PAGECACHE_TAG_TOWRITE tag even if there are still dirty blocks in the folio. This can break ordering guarantees, such as those required by btrfs_wait_ordered_extents(). That ordering breakage leads to a real failure. For example, running generic/464 on a zoned setup will hit the following ASSERT. This happens because the broken ordering fails to flush existing dirty pages before the file size is truncated. assertion failed: !list_empty(&ordered->list) :: 0, in fs/btrfs/zoned.c:1899 ------------[ cut here ]------------ kernel BUG at fs/btrfs/zoned.c:1899! Oops: invalid opcode: 0000 [#1] SMP NOPTI CPU: 2 UID: 0 PID: 1906169 Comm: kworker/u130:2 Kdump: loaded Not tainted 6.16.0-rc6-BTRFS-ZNS+ #554 PREEMPT(voluntary) Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021 Workqueue: btrfs-endio-write btrfs_work_helper [btrfs] RIP: 0010:btrfs_finish_ordered_zoned.cold+0x50/0x52 [btrfs] RSP: 0018:ffffc9002efdbd60 EFLAGS: 00010246 RAX: 000000000000004c RBX: ffff88811923c4e0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff827e38b1 RDI: 00000000ffffffff RBP: ffff88810005d000 R08: 00000000ffffdfff R09: ffffffff831051c8 R10: ffffffff83055220 R11: 0000000000000000 R12: ffff8881c2458c00 R13: ffff88811923c540 R14: ffff88811923c5e8 R15: ffff8881c1bd9680 FS: 0000000000000000(0000) GS:ffff88a04acd0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f907c7a918c CR3: 0000000004024000 CR4: 0000000000350ef0 Call Trace: <TASK> ? srso_return_thunk+0x5/0x5f btrfs_finish_ordered_io+0x4a/0x60 [btrfs] btrfs_work_helper+0xf9/0x490 [btrfs] process_one_work+0x204/0x590 ? srso_return_thunk+0x5/0x5f worker_thread+0x1d6/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0x118/0x230 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x205/0x260 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Consider process A calling writepages() with WB_SYNC_NONE. In zoned mode or for compressed writes, it locks several folios for delalloc and starts writing them out. Let's call the last locked folio folio X. Suppose the write range only partially covers folio X, leaving some pages dirty. Process A calls btrfs_subpage_set_writeback() when building a bio. This function call clears the TOWRITE tag of folio X, whose size = 8K and the block size = 4K. It is following state. 0 4K 8K |/////|/////| (flag: DIRTY, tag: DIRTY) <-----> Process A will write this range. Now suppose process B concurrently calls writepages() with WB_SYNC_ALL. It calls tag_pages_for_writeback() to tag dirty folios with PAGECACHE_TAG_TOWRITE. Since folio X is still dirty, it gets tagged. Then, B collects tagged folios using filemap_get_folios_tag() and must wait for folio X to be written before returning from writepages(). 0 4K 8K |/////|/////| (flag: DIRTY, tag: DIRTY|TOWRITE) However, between tagging and collecting, process A may call btrfs_subpage_set_writeback() and clear folio X's TOWRITE tag. 0 4K 8K | |/////| (flag: DIRTY|WRITEBACK, tag: DIRTY) As a result, process B won't see folio X in its batch, and returns without waiting for it. This breaks the WB_SYNC_ALL ordering requirement. Fix this by using btrfs_subpage_set_writeback_keepwrite(), which retains the TOWRITE tag. We now manually clear the tag only after the folio becomes clean, via the xas operation.