Commit Graph

295 Commits

Author SHA1 Message Date
Miklos Szeredi
24d464d38b BACKPORT: fuse: fix pipe buffer lifetime for direct_io
commit 0c4bcfdecb1ac0967619ee7ff44871d93c08c909 upstream.

In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls
fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then
imports the write buffer with fuse_get_user_pages(), which uses
iov_iter_get_pages() to grab references to userspace pages instead of
actually copying memory.

On the filesystem device side, these pages can then either be read to
userspace (via fuse_dev_read()), or splice()d over into a pipe using
fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops.

This is wrong because after fuse_dev_do_read() unlocks the FUSE request,
the userspace filesystem can mark the request as completed, causing write()
to return. At that point, the userspace filesystem should no longer have
access to the pipe buffer.

Fix by copying pages coming from the user address space to new pipe
buffers.

Bug: 226679409
Reported-by: Jann Horn <jannh@google.com>
Fixes: c3021629a0 ("fuse: support splice() reading from fuse device")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I57a98e96e36bb97ce3e7b1ebf88917c6c8b0247d
2022-04-28 11:33:22 +00:00
Greg Kroah-Hartman
c4d08791d9 Merge 5.10.87 into android12-5.10-lts
Changes in 5.10.87
	nfc: fix segfault in nfc_genl_dump_devices_done
	drm/msm/dsi: set default num_data_lanes
	KVM: arm64: Save PSTATE early on exit
	s390/test_unwind: use raw opcode instead of invalid instruction
	Revert "tty: serial: fsl_lpuart: drop earlycon entry for i.MX8QXP"
	net/mlx4_en: Update reported link modes for 1/10G
	ALSA: hda: Add Intel DG2 PCI ID and HDMI codec vid
	ALSA: hda/hdmi: fix HDA codec entry table order for ADL-P
	parisc/agp: Annotate parisc agp init functions with __init
	i2c: rk3x: Handle a spurious start completion interrupt flag
	net: netlink: af_netlink: Prevent empty skb by adding a check on len.
	drm/amd/display: Fix for the no Audio bug with Tiled Displays
	drm/amd/display: add connector type check for CRC source set
	tracing: Fix a kmemleak false positive in tracing_map
	KVM: x86: Ignore sparse banks size for an "all CPUs", non-sparse IPI req
	staging: most: dim2: use device release method
	bpf: Fix integer overflow in argument calculation for bpf_map_area_alloc
	fuse: make sure reclaim doesn't write the inode
	hwmon: (dell-smm) Fix warning on /proc/i8k creation error
	ethtool: do not perform operations on net devices being unregistered
	perf inject: Fix itrace space allowed for new attributes
	perf intel-pt: Fix some PGE (packet generation enable/control flow packets) usage
	perf intel-pt: Fix sync state when a PSB (synchronization) packet is found
	perf intel-pt: Fix intel_pt_fup_event() assumptions about setting state type
	perf intel-pt: Fix state setting when receiving overflow (OVF) packet
	perf intel-pt: Fix next 'err' value, walking trace
	perf intel-pt: Fix missing 'instruction' events with 'q' option
	perf intel-pt: Fix error timestamp setting on the decoder error path
	memblock: free_unused_memmap: use pageblock units instead of MAX_ORDER
	memblock: align freed memory map on pageblock boundaries with SPARSEMEM
	memblock: ensure there is no overflow in memblock_overlaps_region()
	arm: extend pfn_valid to take into account freed memory map alignment
	arm: ioremap: don't abuse pfn_valid() to check if pfn is in RAM
	Linux 5.10.87

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I56719d03237a8607e6cc0bc357421d0b4a479084
2021-12-17 14:04:19 +01:00
Miklos Szeredi
c31470a30c fuse: make sure reclaim doesn't write the inode
commit 5c791fe1e2a4f401f819065ea4fc0450849f1818 upstream.

In writeback cache mode mtime/ctime updates are cached, and flushed to the
server using the ->write_inode() callback.

Closing the file will result in a dirty inode being immediately written,
but in other cases the inode can remain dirty after all references are
dropped.  This result in the inode being written back from reclaim, which
can deadlock on a regular allocation while the request is being served.

The usual mechanisms (GFP_NOFS/PF_MEMALLOC*) don't work for FUSE, because
serving a request involves unrelated userspace process(es).

Instead do the same as for dirty pages: make sure the inode is written
before the last reference is gone.

 - fallocate(2)/copy_file_range(2): these call file_update_time() or
   file_modified(), so flush the inode before returning from the call

 - unlink(2), link(2) and rename(2): these call fuse_update_ctime(), so
   flush the ctime directly from this helper

Reported-by: chenguanyou <chenguanyou@xiaomi.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Ed Tsai <ed.tsai@mediatek.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-17 10:14:41 +01:00
Greg Kroah-Hartman
77b971b479 Merge 5.10.63 into android12-5.10-lts
Changes in 5.10.63
	ext4: fix race writing to an inline_data file while its xattrs are changing
	fscrypt: add fscrypt_symlink_getattr() for computing st_size
	ext4: report correct st_size for encrypted symlinks
	f2fs: report correct st_size for encrypted symlinks
	ubifs: report correct st_size for encrypted symlinks
	Revert "ucounts: Increase ucounts reference counter before the security hook"
	Revert "cred: add missing return error code when set_cred_ucounts() failed"
	Revert "Add a reference to ucounts for each cred"
	static_call: Fix unused variable warn w/o MODULE
	xtensa: fix kconfig unmet dependency warning for HAVE_FUTEX_CMPXCHG
	ARM: OMAP1: ams-delta: remove unused function ams_delta_camera_power
	gpu: ipu-v3: Fix i.MX IPU-v3 offset calculations for (semi)planar U/V formats
	reset: reset-zynqmp: Fixed the argument data type
	qed: Fix the VF msix vectors flow
	net: macb: Add a NULL check on desc_ptp
	qede: Fix memset corruption
	perf/x86/intel/pt: Fix mask of num_address_ranges
	ceph: fix possible null-pointer dereference in ceph_mdsmap_decode()
	perf/x86/amd/ibs: Work around erratum #1197
	perf/x86/amd/power: Assign pmu.module
	cryptoloop: add a deprecation warning
	ALSA: hda/realtek: Quirk for HP Spectre x360 14 amp setup
	ALSA: hda/realtek: Workaround for conflicting SSID on ASUS ROG Strix G17
	ALSA: pcm: fix divide error in snd_pcm_lib_ioctl
	serial: 8250: 8250_omap: Fix possible array out of bounds access
	spi: Switch to signed types for *_native_cs SPI controller fields
	new helper: inode_wrong_type()
	fuse: fix illegal access to inode with reused nodeid
	media: stkwebcam: fix memory leak in stk_camera_probe
	Linux 5.10.63

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5d461fa0b4dd5ba2457663bd20da1001936feaca
2021-09-08 09:08:09 +02:00
Amir Goldstein
ad5e13f15d fuse: fix illegal access to inode with reused nodeid
commit 15db16837a35d8007cb8563358787412213db25e upstream.

Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
with ourarg containing nodeid and generation.

If a fuse inode is found in inode cache with the same nodeid but different
generation, the existing fuse inode should be unhashed and marked "bad" and
a new inode with the new generation should be hashed instead.

This can happen, for example, with passhrough fuse filesystem that returns
the real filesystem ino/generation on lookup and where real inode numbers
can get recycled due to real files being unlinked not via the fuse
passthrough filesystem.

With current code, this situation will not be detected and an old fuse
dentry that used to point to an older generation real inode, can be used to
access a completely new inode, which should be accessed only via the new
dentry.

Note that because the FORGET message carries the nodeid w/o generation, the
server should wait to get FORGET counts for the nlookup counts of the old
and reused inodes combined, before it can free the resources associated to
that nodeid.

Stable backport notes:
* This is not a regression. The bug has been in fuse forever, but only
  a certain class of low level fuse filesystems can trigger this bug
* Because there is no way to check if this fix is applied in runtime,
  libfuse test_examples.py tests this fix with hardcoded check for
  kernel version >= 5.14
* After backport to stable kernel(s), the libfuse test can be updated
  to also check minimal stable kernel version(s)
* Depends on "fuse: fix bad inode" which is already applied to stable
  kernels v5.4.y and v5.10.y
* Required backporting helper inode_wrong_type()

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxi8DymG=JO_sAU+wS8akFdzh+PuXwW3Ebgahd2Nwnh7zA@mail.gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-08 08:49:02 +02:00
Alessio Balsini
5e424f8596 ANDROID: fuse/passthrough: API V2 with __u32 open argument
The initial FUSE passthrough interface has the issue of introducing an
ioctl which receives as a parameter a data structure containing a
pointer. What happens is that, depending on the architecture, the size
of this struct might change, and especially for 32-bit userspace running
on 64-bit kernel, the size mismatch results into different a single
ioctl the behavior of which depends on the data that is passed (e.g.,
with an enum). This is just a poor ioctl design as mentioned by Arnd
Bergmann [1].

Introduce the new FUSE_PASSTHROUGH_OPEN ioctl which only gets the fd of
the lower file system, which is a fixed-size __u32, dropping the
confusing fuse_passthrough_out data structure.

[1] https://lore.kernel.org/lkml/CAK8P3a2K2FzPvqBYL9W=Yut58SFXyetXwU4Fz50G5O3TsS0pPQ@mail.gmail.com/

Bug: 175195837
Signed-off-by: Alessio Balsini <balsini@google.com>
Change-Id: I486d71cbe20f3c0c87544fa75da4e2704fe57c7c
2021-05-31 09:48:43 +01:00
Greg Kroah-Hartman
a1ac3f3093 Merge 5.10.36 into android12-5.10
Changes in 5.10.36
	bus: mhi: core: Fix check for syserr at power_up
	bus: mhi: core: Clear configuration from channel context during reset
	bus: mhi: core: Sanity check values from remote device before use
	nitro_enclaves: Fix stale file descriptors on failed usercopy
	dyndbg: fix parsing file query without a line-range suffix
	s390/disassembler: increase ebpf disasm buffer size
	s390/zcrypt: fix zcard and zqueue hot-unplug memleak
	vhost-vdpa: fix vm_flags for virtqueue doorbell mapping
	tpm: acpi: Check eventlog signature before using it
	ACPI: custom_method: fix potential use-after-free issue
	ACPI: custom_method: fix a possible memory leak
	ftrace: Handle commands when closing set_ftrace_filter file
	ARM: 9056/1: decompressor: fix BSS size calculation for LLVM ld.lld
	arm64: dts: marvell: armada-37xx: add syscon compatible to NB clk node
	arm64: dts: mt8173: fix property typo of 'phys' in dsi node
	ecryptfs: fix kernel panic with null dev_name
	fs/epoll: restore waking from ep_done_scan()
	mtd: spi-nor: core: Fix an issue of releasing resources during read/write
	Revert "mtd: spi-nor: macronix: Add support for mx25l51245g"
	mtd: spinand: core: add missing MODULE_DEVICE_TABLE()
	mtd: rawnand: atmel: Update ecc_stats.corrected counter
	mtd: physmap: physmap-bt1-rom: Fix unintentional stack access
	erofs: add unsupported inode i_format check
	spi: stm32-qspi: fix pm_runtime usage_count counter
	spi: spi-ti-qspi: Free DMA resources
	scsi: qla2xxx: Fix crash in qla2xxx_mqueuecommand()
	scsi: mpt3sas: Block PCI config access from userspace during reset
	mmc: uniphier-sd: Fix an error handling path in uniphier_sd_probe()
	mmc: uniphier-sd: Fix a resource leak in the remove function
	mmc: sdhci: Check for reset prior to DMA address unmap
	mmc: sdhci-pci: Fix initialization of some SD cards for Intel BYT-based controllers
	mmc: sdhci-tegra: Add required callbacks to set/clear CQE_EN bit
	mmc: block: Update ext_csd.cache_ctrl if it was written
	mmc: block: Issue a cache flush only when it's enabled
	mmc: core: Do a power cycle when the CMD11 fails
	mmc: core: Set read only for SD cards with permanent write protect bit
	mmc: core: Fix hanging on I/O during system suspend for removable cards
	irqchip/gic-v3: Do not enable irqs when handling spurious interrups
	cifs: Return correct error code from smb2_get_enc_key
	cifs: fix out-of-bound memory access when calling smb3_notify() at mount point
	cifs: detect dead connections only when echoes are enabled.
	smb2: fix use-after-free in smb2_ioctl_query_info()
	btrfs: handle remount to no compress during compression
	x86/build: Disable HIGHMEM64G selection for M486SX
	btrfs: fix metadata extent leak after failure to create subvolume
	intel_th: pci: Add Rocket Lake CPU support
	btrfs: fix race between transaction aborts and fsyncs leading to use-after-free
	posix-timers: Preserve return value in clock_adjtime32()
	fbdev: zero-fill colormap in fbcmap.c
	cpuidle: tegra: Fix C7 idling state on Tegra114
	bus: ti-sysc: Probe for l4_wkup and l4_cfg interconnect devices first
	staging: wimax/i2400m: fix byte-order issue
	spi: ath79: always call chipselect function
	spi: ath79: remove spi-master setup and cleanup assignment
	bus: mhi: core: Destroy SBL devices when moving to mission mode
	crypto: api - check for ERR pointers in crypto_destroy_tfm()
	crypto: qat - fix unmap invalid dma address
	usb: gadget: uvc: add bInterval checking for HS mode
	usb: webcam: Invalid size of Processing Unit Descriptor
	x86/sev: Do not require Hypervisor CPUID bit for SEV guests
	crypto: hisilicon/sec - fixes a printing error
	genirq/matrix: Prevent allocation counter corruption
	usb: gadget: f_uac2: validate input parameters
	usb: gadget: f_uac1: validate input parameters
	usb: dwc3: gadget: Ignore EP queue requests during bus reset
	usb: xhci: Fix port minor revision
	kselftest/arm64: mte: Fix compilation with native compiler
	ARM: tegra: acer-a500: Rename avdd to vdda of touchscreen node
	PCI: PM: Do not read power state in pci_enable_device_flags()
	kselftest/arm64: mte: Fix MTE feature detection
	ARM: dts: BCM5301X: fix "reg" formatting in /memory node
	ARM: dts: ux500: Fix up TVK R3 sensors
	x86/build: Propagate $(CLANG_FLAGS) to $(REALMODE_FLAGS)
	x86/boot: Add $(CLANG_FLAGS) to compressed KBUILD_CFLAGS
	efi/libstub: Add $(CLANG_FLAGS) to x86 flags
	soc/tegra: pmc: Fix completion of power-gate toggling
	arm64: dts: imx8mq-librem5-r3: Mark buck3 as always on
	tee: optee: do not check memref size on return from Secure World
	soundwire: cadence: only prepare attached devices on clock stop
	perf/arm_pmu_platform: Use dev_err_probe() for IRQ errors
	perf/arm_pmu_platform: Fix error handling
	random: initialize ChaCha20 constants with correct endianness
	usb: xhci-mtk: support quirk to disable usb2 lpm
	fpga: dfl: pci: add DID for D5005 PAC cards
	xhci: check port array allocation was successful before dereferencing it
	xhci: check control context is valid before dereferencing it.
	xhci: fix potential array out of bounds with several interrupters
	bus: mhi: core: Clear context for stopped channels from remove()
	ARM: dts: at91: change the key code of the gpio key
	tools/power/x86/intel-speed-select: Increase string size
	platform/x86: ISST: Account for increased timeout in some cases
	spi: dln2: Fix reference leak to master
	spi: omap-100k: Fix reference leak to master
	spi: qup: fix PM reference leak in spi_qup_remove()
	usb: gadget: tegra-xudc: Fix possible use-after-free in tegra_xudc_remove()
	usb: musb: fix PM reference leak in musb_irq_work()
	usb: core: hub: Fix PM reference leak in usb_port_resume()
	usb: dwc3: gadget: Check for disabled LPM quirk
	tty: n_gsm: check error while registering tty devices
	intel_th: Consistency and off-by-one fix
	phy: phy-twl4030-usb: Fix possible use-after-free in twl4030_usb_remove()
	crypto: sun8i-ss - Fix PM reference leak when pm_runtime_get_sync() fails
	crypto: sun8i-ce - Fix PM reference leak in sun8i_ce_probe()
	crypto: stm32/hash - Fix PM reference leak on stm32-hash.c
	crypto: stm32/cryp - Fix PM reference leak on stm32-cryp.c
	crypto: sa2ul - Fix PM reference leak in sa_ul_probe()
	crypto: omap-aes - Fix PM reference leak on omap-aes.c
	platform/x86: intel_pmc_core: Don't use global pmcdev in quirks
	spi: sync up initial chipselect state
	btrfs: do proper error handling in create_reloc_root
	btrfs: do proper error handling in btrfs_update_reloc_root
	btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'s
	drm: Added orientation quirk for OneGX1 Pro
	drm/qxl: do not run release if qxl failed to init
	drm/qxl: release shadow on shutdown
	drm/ast: Fix invalid usage of AST_MAX_HWC_WIDTH in cursor atomic_check
	drm/amd/display: changing sr exit latency
	drm/ast: fix memory leak when unload the driver
	drm/amd/display: Check for DSC support instead of ASIC revision
	drm/amd/display: Don't optimize bandwidth before disabling planes
	drm/amdgpu/display: buffer INTERRUPT_LOW_IRQ_CONTEXT interrupt work
	drm/amd/display/dc/dce/dce_aux: Remove duplicate line causing 'field overwritten' issue
	scsi: lpfc: Fix incorrect dbde assignment when building target abts wqe
	scsi: lpfc: Fix pt2pt connection does not recover after LOGO
	drm/amdgpu: Fix some unload driver issues
	sched/pelt: Fix task util_est update filtering
	kvfree_rcu: Use same set of GFP flags as does single-argument
	scsi: target: pscsi: Fix warning in pscsi_complete_cmd()
	media: ite-cir: check for receive overflow
	media: drivers: media: pci: sta2x11: fix Kconfig dependency on GPIOLIB
	media: imx: capture: Return -EPIPE from __capture_legacy_try_fmt()
	atomisp: don't let it go past pipes array
	power: supply: bq27xxx: fix power_avg for newer ICs
	extcon: arizona: Fix some issues when HPDET IRQ fires after the jack has been unplugged
	extcon: arizona: Fix various races on driver unbind
	media: media/saa7164: fix saa7164_encoder_register() memory leak bugs
	media: gspca/sq905.c: fix uninitialized variable
	power: supply: Use IRQF_ONESHOT
	backlight: qcom-wled: Use sink_addr for sync toggle
	backlight: qcom-wled: Fix FSC update issue for WLED5
	drm/amdgpu: mask the xgmi number of hops reported from psp to kfd
	drm/amdkfd: Fix UBSAN shift-out-of-bounds warning
	drm/amdgpu : Fix asic reset regression issue introduce by 8f211fe8ac7c4f
	drm/amd/pm: fix workload mismatch on vega10
	drm/amd/display: Fix UBSAN warning for not a valid value for type '_Bool'
	drm/amd/display: DCHUB underflow counter increasing in some scenarios
	drm/amd/display: fix dml prefetch validation
	scsi: qla2xxx: Always check the return value of qla24xx_get_isp_stats()
	drm/vkms: fix misuse of WARN_ON
	scsi: qla2xxx: Fix use after free in bsg
	mmc: sdhci-esdhc-imx: validate pinctrl before use it
	mmc: sdhci-pci: Add PCI IDs for Intel LKF
	mmc: sdhci-brcmstb: Remove CQE quirk
	ata: ahci: Disable SXS for Hisilicon Kunpeng920
	drm/komeda: Fix bit check to import to value of proper type
	nvmet: return proper error code from discovery ctrl
	selftests/resctrl: Enable gcc checks to detect buffer overflows
	selftests/resctrl: Fix compilation issues for global variables
	selftests/resctrl: Fix compilation issues for other global variables
	selftests/resctrl: Clean up resctrl features check
	selftests/resctrl: Fix missing options "-n" and "-p"
	selftests/resctrl: Use resctrl/info for feature detection
	selftests/resctrl: Fix incorrect parsing of iMC counters
	selftests/resctrl: Fix checking for < 0 for unsigned values
	power: supply: cpcap-charger: Add usleep to cpcap charger to avoid usb plug bounce
	scsi: smartpqi: Use host-wide tag space
	scsi: smartpqi: Correct request leakage during reset operations
	scsi: smartpqi: Add new PCI IDs
	scsi: scsi_dh_alua: Remove check for ASC 24h in alua_rtpg()
	media: em28xx: fix memory leak
	media: vivid: update EDID
	drm/msm/dp: Fix incorrect NULL check kbot warnings in DP driver
	clk: socfpga: arria10: Fix memory leak of socfpga_clk on error return
	power: supply: generic-adc-battery: fix possible use-after-free in gab_remove()
	power: supply: s3c_adc_battery: fix possible use-after-free in s3c_adc_bat_remove()
	media: tc358743: fix possible use-after-free in tc358743_remove()
	media: adv7604: fix possible use-after-free in adv76xx_remove()
	media: i2c: adv7511-v4l2: fix possible use-after-free in adv7511_remove()
	media: i2c: tda1997: Fix possible use-after-free in tda1997x_remove()
	media: i2c: adv7842: fix possible use-after-free in adv7842_remove()
	media: platform: sti: Fix runtime PM imbalance in regs_show
	media: sun8i-di: Fix runtime PM imbalance in deinterlace_start_streaming
	media: dvb-usb: fix memory leak in dvb_usb_adapter_init
	media: gscpa/stv06xx: fix memory leak
	sched/fair: Ignore percpu threads for imbalance pulls
	drm/msm/mdp5: Configure PP_SYNC_HEIGHT to double the vtotal
	drm/msm/mdp5: Do not multiply vclk line count by 100
	drm/amdgpu/ttm: Fix memory leak userptr pages
	drm/radeon/ttm: Fix memory leak userptr pages
	drm/amd/display: Fix debugfs link_settings entry
	drm/amd/display: Fix UBSAN: shift-out-of-bounds warning
	drm/amdkfd: Fix cat debugfs hang_hws file causes system crash bug
	amdgpu: avoid incorrect %hu format string
	drm/amd/display: Try YCbCr420 color when YCbCr444 fails
	drm/amdgpu: fix NULL pointer dereference
	scsi: lpfc: Fix crash when a REG_RPI mailbox fails triggering a LOGO response
	scsi: lpfc: Fix error handling for mailboxes completed in MBX_POLL mode
	scsi: lpfc: Remove unsupported mbox PORT_CAPABILITIES logic
	mfd: intel-m10-bmc: Fix the register access range
	mfd: da9063: Support SMBus and I2C mode
	mfd: arizona: Fix rumtime PM imbalance on error
	scsi: libfc: Fix a format specifier
	perf: Rework perf_event_exit_event()
	sched,fair: Alternative sched_slice()
	block/rnbd-clt: Fix missing a memory free when unloading the module
	s390/archrandom: add parameter check for s390_arch_random_generate
	sched,psi: Handle potential task count underflow bugs more gracefully
	power: supply: cpcap-battery: fix invalid usage of list cursor
	ALSA: emu8000: Fix a use after free in snd_emu8000_create_mixer
	ALSA: hda/conexant: Re-order CX5066 quirk table entries
	ALSA: sb: Fix two use after free in snd_sb_qsound_build
	ALSA: usb-audio: Explicitly set up the clock selector
	ALSA: usb-audio: Add dB range mapping for Sennheiser Communications Headset PC 8
	ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 445 G7
	ALSA: hda/realtek: GA503 use same quirks as GA401
	ALSA: hda/realtek: fix mic boost on Intel NUC 8
	ALSA: hda/realtek - Headset Mic issue on HP platform
	ALSA: hda/realtek: fix static noise on ALC285 Lenovo laptops
	ALSA: hda/realtek: Add quirk for Intel Clevo PCx0Dx
	tools/power/turbostat: Fix turbostat for AMD Zen CPUs
	btrfs: fix race when picking most recent mod log operation for an old root
	arm64/vdso: Discard .note.gnu.property sections in vDSO
	Makefile: Move -Wno-unused-but-set-variable out of GCC only block
	fs: fix reporting supported extra file attributes for statx()
	virtiofs: fix memory leak in virtio_fs_probe()
	kcsan, debugfs: Move debugfs file creation out of early init
	ubifs: Only check replay with inode type to judge if inode linked
	f2fs: fix error handling in f2fs_end_enable_verity()
	f2fs: fix to avoid out-of-bounds memory access
	mlxsw: spectrum_mr: Update egress RIF list before route's action
	openvswitch: fix stack OOB read while fragmenting IPv4 packets
	ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure
	NFS: fs_context: validate UDP retrans to prevent shift out-of-bounds
	NFS: Don't discard pNFS layout segments that are marked for return
	NFSv4: Don't discard segments marked for return in _pnfs_return_layout()
	Input: ili210x - add missing negation for touch indication on ili210x
	jffs2: Fix kasan slab-out-of-bounds problem
	jffs2: Hook up splice_write callback
	powerpc/powernv: Enable HAIL (HV AIL) for ISA v3.1 processors
	powerpc/eeh: Fix EEH handling for hugepages in ioremap space.
	powerpc/kexec_file: Use current CPU info while setting up FDT
	powerpc/32: Fix boot failure with CONFIG_STACKPROTECTOR
	powerpc: fix EDEADLOCK redefinition error in uapi/asm/errno.h
	intel_th: pci: Add Alder Lake-M support
	tpm: efi: Use local variable for calculating final log size
	tpm: vtpm_proxy: Avoid reading host log when using a virtual device
	crypto: arm/curve25519 - Move '.fpu' after '.arch'
	crypto: rng - fix crypto_rng_reset() refcounting when !CRYPTO_STATS
	md/raid1: properly indicate failure when ending a failed write request
	dm raid: fix inconclusive reshape layout on fast raid4/5/6 table reload sequences
	fuse: fix write deadlock
	exfat: fix erroneous discard when clear cluster bit
	sfc: farch: fix TX queue lookup in TX flush done handling
	sfc: farch: fix TX queue lookup in TX event handling
	security: commoncap: fix -Wstringop-overread warning
	Fix misc new gcc warnings
	jffs2: check the validity of dstlen in jffs2_zlib_compress()
	smb3: when mounting with multichannel include it in requested capabilities
	smb3: do not attempt multichannel to server which does not support it
	Revert 337f13046f ("futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op")
	futex: Do not apply time namespace adjustment on FUTEX_LOCK_PI
	x86/cpu: Initialize MSR_TSC_AUX if RDTSCP *or* RDPID is supported
	kbuild: update config_data.gz only when the content of .config is changed
	ext4: annotate data race in start_this_handle()
	ext4: annotate data race in jbd2_journal_dirty_metadata()
	ext4: fix check to prevent false positive report of incorrect used inodes
	ext4: do not set SB_ACTIVE in ext4_orphan_cleanup()
	ext4: fix error code in ext4_commit_super
	ext4: fix ext4_error_err save negative errno into superblock
	ext4: fix error return code in ext4_fc_perform_commit()
	ext4: allow the dax flag to be set and cleared on inline directories
	ext4: Fix occasional generic/418 failure
	media: dvbdev: Fix memory leak in dvb_media_device_free()
	media: dvb-usb: Fix use-after-free access
	media: dvb-usb: Fix memory leak at error in dvb_usb_device_init()
	media: staging/intel-ipu3: Fix memory leak in imu_fmt
	media: staging/intel-ipu3: Fix set_fmt error handling
	media: staging/intel-ipu3: Fix race condition during set_fmt
	media: v4l2-ctrls: fix reference to freed memory
	media: venus: hfi_parser: Don't initialize parser on v1
	usb: gadget: dummy_hcd: fix gpf in gadget_setup
	usb: gadget: Fix double free of device descriptor pointers
	usb: gadget/function/f_fs string table fix for multiple languages
	usb: dwc3: gadget: Remove FS bInterval_m1 limitation
	usb: dwc3: gadget: Fix START_TRANSFER link state check
	usb: dwc3: core: Do core softreset when switch mode
	usb: dwc2: Fix session request interrupt handler
	tty: fix memory leak in vc_deallocate
	rsi: Use resume_noirq for SDIO
	tools/power turbostat: Fix offset overflow issue in index converting
	tracing: Map all PIDs to command lines
	tracing: Restructure trace_clock_global() to never block
	dm persistent data: packed struct should have an aligned() attribute too
	dm space map common: fix division bug in sm_ll_find_free_block()
	dm integrity: fix missing goto in bitmap_flush_interval error handling
	dm rq: fix double free of blk_mq_tag_set in dev remove after table load fails
	lib/vsprintf.c: remove leftover 'f' and 'F' cases from bstr_printf()
	thermal/drivers/cpufreq_cooling: Fix slab OOB issue
	thermal/core/fair share: Lock the thermal zone while looping over instances
	Linux 5.10.36

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I7b8075de5edd8de69205205cddb9a3273d7d0810
2021-05-13 14:22:11 +02:00
Vivek Goyal
1c525c2656 fuse: fix write deadlock
commit 4f06dd92b5d0a6f8eec6a34b8d6ef3e1f4ac1e10 upstream.

There are two modes for write(2) and friends in fuse:

a) write through (update page cache, send sync WRITE request to userspace)

b) buffered write (update page cache, async writeout later)

The write through method kept all the page cache pages locked that were
used for the request.  Keeping more than one page locked is deadlock prone
and Qian Cai demonstrated this with trinity fuzzing.

The reason for keeping the pages locked is that concurrent mapped reads
shouldn't try to pull possibly stale data into the page cache.

For full page writes, the easy way to fix this is to make the cached page
be the authoritative source by marking the page PG_uptodate immediately.
After this the page can be safely unlocked, since mapped/cached reads will
take the written data from the cache.

Concurrent mapped writes will now cause data in the original WRITE request
to be updated; this however doesn't cause any data inconsistency and this
scenario should be exceedingly rare anyway.

If the WRITE request returns with an error in the above case, currently the
page is not marked uptodate; this means that a concurrent read will always
read consistent data.  After this patch the page is uptodate between
writing to the cache and receiving the error: there's window where a cached
read will read the wrong data.  While theoretically this could be a
regression, it is unlikely to be one in practice, since this is normal for
buffered writes.

In case of a partial page write to an already uptodate page the locking is
also unnecessary, with the above caveats.

Partial write of a not uptodate page still needs to be handled.  One way
would be to read the complete page before doing the write.  This is not
possible, since it might break filesystems that don't expect any READ
requests when the file was opened O_WRONLY.

The other solution is to serialize the synchronous write with reads from
the partial pages.  The easiest way to do this is to keep the partial pages
locked.  The problem is that a write() may involve two such pages (one head
and one tail).  This patch fixes it by only locking the partial tail page.
If there's a partial head page as well, then split that off as a separate
WRITE request.

Reported-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/linux-fsdevel/4794a3fa3742a5e84fb0f934944204b55730829b.camel@lca.pw/
Fixes: ea9b9907b8 ("fuse: implement perform_write")
Cc: <stable@vger.kernel.org> # v2.6.26
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11 14:47:36 +02:00
Greg Kroah-Hartman
061e6e09f0 Merge 5.10.25 into android12-5.10-lts
Changes in 5.10.25
	crypto: aesni - Use TEST %reg,%reg instead of CMP $0,%reg
	crypto: x86/aes-ni-xts - use direct calls to and 4-way stride
	bpf: Prohibit alu ops for pointer types not defining ptr_limit
	bpf: Fix off-by-one for area size in creating mask to left
	bpf: Simplify alu_limit masking for pointer arithmetic
	bpf: Add sanity check for upper ptr_limit
	bpf, selftests: Fix up some test_verifier cases for unprivileged
	RDMA/srp: Fix support for unpopulated and unbalanced NUMA nodes
	fuse: fix live lock in fuse_iget()
	Revert "nfsd4: remove check_conflicting_opens warning"
	Revert "nfsd4: a client's own opens needn't prevent delegations"
	ALSA: usb-audio: Don't avoid stopping the stream at disconnection
	net: dsa: b53: Support setting learning on port
	Linux 5.10.25

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0a19cd5f8dda58a2fa8fdfbe7cbabd2c32cb57bd
2021-03-20 11:28:25 +01:00
Amir Goldstein
d955f13ea2 fuse: fix live lock in fuse_iget()
commit 775c5033a0d164622d9d10dd0f0a5531639ed3ed upstream.

Commit 5d069dbe8aaf ("fuse: fix bad inode") replaced make_bad_inode()
in fuse_iget() with a private implementation fuse_make_bad().

The private implementation fails to remove the bad inode from inode
cache, so the retry loop with iget5_locked() finds the same bad inode
and marks it bad forever.

kmsg snip:

[ ] rcu: INFO: rcu_sched self-detected stall on CPU
...
[ ]  ? bit_wait_io+0x50/0x50
[ ]  ? fuse_init_file_inode+0x70/0x70
[ ]  ? find_inode.isra.32+0x60/0xb0
[ ]  ? fuse_init_file_inode+0x70/0x70
[ ]  ilookup5_nowait+0x65/0x90
[ ]  ? fuse_init_file_inode+0x70/0x70
[ ]  ilookup5.part.36+0x2e/0x80
[ ]  ? fuse_init_file_inode+0x70/0x70
[ ]  ? fuse_inode_eq+0x20/0x20
[ ]  iget5_locked+0x21/0x80
[ ]  ? fuse_inode_eq+0x20/0x20
[ ]  fuse_iget+0x96/0x1b0

Fixes: 5d069dbe8aaf ("fuse: fix bad inode")
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-20 10:43:43 +01:00
Alessio Balsini
32c13ae5ff FROMLIST: fuse: Introduce passthrough for mmap
Enabling FUSE passthrough for mmap-ed operations not only affects
performance, but has also been shown as mandatory for the correct
functioning of FUSE passthrough.
yanwu noticed [1] that a FUSE file with passthrough enabled may suffer
data inconsistencies if the same file is also accessed with mmap. What
happens is that read/write operations are directly applied to the lower
file system (and its cache), while mmap-ed operations are affecting the
FUSE cache.

Extend the FUSE passthrough implementation to also handle memory-mapped
FUSE file, to both fix the cache inconsistencies and extend the
passthrough performance benefits to mmap-ed operations.

[1] https://lore.kernel.org/lkml/20210119110654.11817-1-wu-yan@tcl.com/

Bug: 168023149
Link: https://lore.kernel.org/lkml/20210125153057.3623715-9-balsini@android.com/
Signed-off-by: Alessio Balsini <balsini@android.com>
Change-Id: Ifad4698b0380f6e004c487940ac6907b9a9f2964
Signed-off-by: Alessio Balsini <balsini@google.com>
2021-01-26 19:07:13 +00:00
Alessio Balsini
aa29f32988 FROMLIST: fuse: Use daemon creds in passthrough mode
When using FUSE passthrough, read/write operations are directly
forwarded to the lower file system file through VFS, but there is no
guarantee that the process that is triggering the request has the right
permissions to access the lower file system. This would cause the
read/write access to fail.

In passthrough file systems, where the FUSE daemon is responsible for
the enforcement of the lower file system access policies, often happens
that the process dealing with the FUSE file system doesn't have access
to the lower file system.
Being the FUSE daemon in charge of implementing the FUSE file
operations, that in the case of read/write operations usually simply
results in the copy of memory buffers from/to the lower file system
respectively, these operations are executed with the FUSE daemon
privileges.

This patch adds a reference to the FUSE daemon credentials, referenced
at FUSE_DEV_IOC_PASSTHROUGH_OPEN ioctl() time so that they can be used
to temporarily raise the user credentials when accessing lower file
system files in passthrough.
The process accessing the FUSE file with passthrough enabled temporarily
receives the privileges of the FUSE daemon while performing read/write
operations. Similar behavior is implemented in overlayfs.
These privileges will be reverted as soon as the IO operation completes.
This feature does not provide any higher security privileges to those
processes accessing the FUSE file system with passthrough enabled. This
is because it is still the FUSE daemon responsible for enabling or not
the passthrough feature at file open time, and should enable the feature
only after appropriate access policy checks.

Bug: 168023149
Link: https://lore.kernel.org/lkml/20210125153057.3623715-8-balsini@android.com/
Signed-off-by: Alessio Balsini <balsini@android.com>
Change-Id: Idb4f03a2ce7c536691e5eaf8fadadfcf002e1677
Signed-off-by: Alessio Balsini <balsini@google.com>
2021-01-26 17:20:10 +00:00
Alessio Balsini
b10e3c9c24 FROMLIST: fuse: Introduce synchronous read and write for passthrough
All the read and write operations performed on fuse_files which have the
passthrough feature enabled are forwarded to the associated lower file
system file via VFS.

Sending the request directly to the lower file system avoids the
userspace round-trip that, because of possible context switches and
additional operations might reduce the overall performance, especially
in those cases where caching doesn't help, for example in reads at
random offsets.

Verifying if a fuse_file has a lower file system file associated with
can be done by checking the validity of its passthrough_filp pointer.
This pointer is not NULL only if passthrough has been successfully
enabled via the appropriate ioctl().
When a read/write operation is requested for a FUSE file with
passthrough enabled, a new equivalent VFS request is generated, which
instead targets the lower file system file.
The VFS layer performs additional checks that allow for safer operations
but may cause the operation to fail if the process accessing the FUSE
file system does not have access to the lower file system.

This change only implements synchronous requests in passthrough,
returning an error in the case of asynchronous operations, yet covering
the majority of the use cases.

Bug: 168023149
Link: https://lore.kernel.org/lkml/20210125153057.3623715-6-balsini@android.com/
Signed-off-by: Alessio Balsini <balsini@android.com>
Change-Id: Ifbe6a247fe7338f87d078fde923f0252eeaeb668
Signed-off-by: Alessio Balsini <balsini@google.com>
2021-01-26 17:20:10 +00:00
Alessio Balsini
9634f0e4e2 FROMLIST: fuse: Definitions and ioctl for passthrough
Expose the FUSE_PASSTHROUGH interface to user space and declare all the
basic data structures and functions as the skeleton on top of which the
FUSE passthrough functionality will be built.

As part of this, introduce the new FUSE passthrough ioctl, which allows
the FUSE daemon to specify a direct connection between a FUSE file and a
lower file system file. Such ioctl requires user space to pass the file
descriptor of one of its opened files through the fuse_passthrough_out
data structure introduced in this patch. This structure includes extra
fields for possible future extensions.
Also, add the passthrough functions for the set-up and tear-down of the
data structures and locks that will be used both when fuse_conns and
fuse_files are created/deleted.

Bug: 168023149
Link: https://lore.kernel.org/lkml/20210125153057.3623715-4-balsini@android.com/
Signed-off-by: Alessio Balsini <balsini@android.com>
Change-Id: I732532581348adadda5b5048a9346c2b0868d539
Signed-off-by: Alessio Balsini <balsini@google.com>
2021-01-26 17:20:09 +00:00
Alessio Balsini
95209e20ae Revert "FROMLIST: fuse: Definitions and ioctl() for passthrough"
This reverts commit 314603f83d.

Change-Id: I7c5d8406e1e5c61f8d804bfc68eaea140d7a0c1e
Signed-off-by: Alessio Balsini <balsini@google.com>
2021-01-26 17:20:09 +00:00
Alessio Balsini
fc4ac7f0c4 Revert "FROMLIST: fuse: Introduce synchronous read and write for passthrough"
This reverts commit f32d3e5eb8.

Change-Id: Ie0910ff6c510ba19b56fa3ca22ae13e96f6f37d2
Signed-off-by: Alessio Balsini <balsini@google.com>
2021-01-26 17:20:09 +00:00
Alessio Balsini
6ee117d8c1 Revert "FROMLIST: fuse: Use daemon creds in passthrough mode"
This reverts commit 9ee350d7ec.

Change-Id: Ia7c469a7f908217d2e74b486b544605245bfba10
Signed-off-by: Alessio Balsini <balsini@google.com>
2021-01-26 17:20:08 +00:00
Greg Kroah-Hartman
0290a41d05 Merge 5.10.6 into android12-5.10
Changes in 5.10.6
	Revert "drm/amd/display: Fix memory leaks in S3 resume"
	Revert "mtd: spinand: Fix OOB read"
	rtc: pcf2127: move watchdog initialisation to a separate function
	rtc: pcf2127: only use watchdog when explicitly available
	dt-bindings: rtc: add reset-source property
	kdev_t: always inline major/minor helper functions
	Bluetooth: Fix attempting to set RPA timeout when unsupported
	ALSA: hda/realtek - Modify Dell platform name
	ALSA: hda/hdmi: Fix incorrect mutex unlock in silent_stream_disable()
	drm/i915/tgl: Fix Combo PHY DPLL fractional divider for 38.4MHz ref clock
	scsi: ufs: Allow an error return value from ->device_reset()
	scsi: ufs: Re-enable WriteBooster after device reset
	RDMA/core: remove use of dma_virt_ops
	RDMA/siw,rxe: Make emulated devices virtual in the device tree
	fuse: fix bad inode
	perf: Break deadlock involving exec_update_mutex
	rwsem: Implement down_read_killable_nested
	rwsem: Implement down_read_interruptible
	exec: Transform exec_update_mutex into a rw_semaphore
	mwifiex: Fix possible buffer overflows in mwifiex_cmd_802_11_ad_hoc_start
	Linux 5.10.6

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id4c57a151a1e8f2162163d2337b6055f04edbe9b
2021-01-13 10:28:55 +01:00
Miklos Szeredi
36cf9ae54b fuse: fix bad inode
[ Upstream commit 5d069dbe8aaf2a197142558b6fb2978189ba3454 ]

Jan Kara's analysis of the syzbot report (edited):

  The reproducer opens a directory on FUSE filesystem, it then attaches
  dnotify mark to the open directory.  After that a fuse_do_getattr() call
  finds that attributes returned by the server are inconsistent, and calls
  make_bad_inode() which, among other things does:

          inode->i_mode = S_IFREG;

  This then confuses dnotify which doesn't tear down its structures
  properly and eventually crashes.

Avoid calling make_bad_inode() on a live inode: switch to a private flag on
the fuse inode.  Also add the test to ops which the bad_inode_ops would
have caught.

This bug goes back to the initial merge of fuse in 2.6.14...

Reported-by: syzbot+f427adf9324b92652ccc@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-09 13:46:24 +01:00
Daniel Rosenberg
aca265111a ANDROID: fuse: Add support for d_canonical_path
Allows FUSE to report to inotify that it is acting as a layered filesystem.
The userspace component returns a string representing the location of the
underlying file. If the string cannot be resolved into a path, the top
level path is returned instead.

Bug: 23904372
Bug: 171780975
Test: FileObserverTest and FileObserverTestLegacyPath on cuttlefish
Change-Id: Iabdca0bbedfbff59e9c820c58636a68ef9683d9f
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Alessio Balsini <balsini@google.com>
2020-11-06 18:49:25 +00:00
Alessio Balsini
9ee350d7ec FROMLIST: fuse: Use daemon creds in passthrough mode
When using FUSE passthrough, read/write operations are directly forwarded
to the lower file system file through VFS, but there is no guarantee that
the process that is triggering the request has the right permissions to
access the lower file system. This would cause the read/write access to
fail.

In passthrough file systems, where the FUSE daemon is responsible for the
enforcement of the lower file system access policies, often happens that
the process dealing with the FUSE file system doesn't have access to the
lower file system.
Being the FUSE daemon in charge of implementing the FUSE file operations,
that in the case of read/write operations usually simply results in the
copy of memory buffers from/to the lower file system respectively, these
operations are executed with the FUSE daemon privileges.

This patch adds a reference to the FUSE daemon credentials, referenced at
FUSE_DEV_IOC_PASSTHROUGH_OPEN ioctl() time so that they can be used to
temporarily raise the user credentials when accessing lower file system
files in passthrough.
The process accessing the FUSE file with passthrough enabled temporarily
receives the privileges of the FUSE daemon while performing read/write
operations. Similar behavior is implemented in overlayfs.
These privileges will be reverted as soon as the IO operation completes.
This feature does not provide any higher security privileges to those
processes accessing the FUSE file system with passthrough enabled. This is
because it is still the FUSE daemon responsible for enabling or not the
passthrough feature at file open time, and should enable the feature only
after appropriate access policy checks.

Bug: 168023149
Link: https://lore.kernel.org/lkml/20201026125016.1905945-6-balsini@android.com/
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Alessio Balsini <balsini@google.com>
Change-Id: I1123f8113578eb8713f2b777a1b5ec76882bd762
2020-11-02 19:15:36 +00:00
Alessio Balsini
f32d3e5eb8 FROMLIST: fuse: Introduce synchronous read and write for passthrough
All the read and write operations performed on fuse_files which have the
passthrough feature enabled are forwarded to the associated lower file
system file via VFS.

Sending the request directly to the lower file system avoids the userspace
round-trip that, because of possible context switches and additional
operations might reduce the overall performance, especially in those cases
where caching doesn't help, for example in reads at random offsets.

Verifying if a fuse_file has a lower file system file associated with can
be done by checking the validity of its passthrough_filp pointer. This
pointer is not NULL only if passthrough has been successfully enabled via
the appropriate ioctl().
When a read/write operation is requested for a FUSE file with passthrough
enabled, a new equivalent VFS request is generated, which instead targets
the lower file system file.
The VFS layer performs additional checks that allow for safer operations
but may cause the operation to fail if the process accessing the FUSE file
system does not have access to the lower file system.

This change only implements synchronous requests in passthrough, returning
an error in the case of asynchronous operations, yet covering the majority
of the use cases.

Bug: 168023149
Link: https://lore.kernel.org/lkml/20201026125016.1905945-4-balsini@android.com/
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Alessio Balsini <balsini@google.com>
Change-Id: If76bb8725e1ac567f9dbe3edb79ebb4d43d77dfb
2020-11-02 19:15:36 +00:00
Alessio Balsini
314603f83d FROMLIST: fuse: Definitions and ioctl() for passthrough
Expose the FUSE_PASSTHROUGH interface to userspace and declare all the
basic data structures and functions as the skeleton on top of which the
FUSE passthrough functionality will be built.

As part of this, introduce the new FUSE passthrough ioctl(), which
allows
the FUSE daemon to specify a direct connection between a FUSE file and a
lower file system file. Such ioctl() requires userspace to pass the file
descriptor of one of its opened files through the fuse_passthrough_out
data
structure introduced in this patch. This structure includes extra fields
for possible future extensions.
Also, add the passthrough functions for the set-up and tear-down of the
data structures and locks that will be used both when fuse_conns and
fuse_files are created/deleted.

Bug: 168023149
Link: https://lore.kernel.org/lkml/20201026125016.1905945-2-balsini@android.com/
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Alessio Balsini <balsini@google.com>
Change-Id: I6dd150b93607e10ed53f7e7975b35b6090080fa2
2020-11-02 19:15:35 +00:00
Greg Kroah-Hartman
caf51e6b49 Merge 694565356c ("Merge tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse") into android-mainline
Steps on the way to 5.10-rc1

Resolves conflicts in:
	fs/fuse/fuse_i.h

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifa200ce8fae0e3b38c86351006824c62328c00f7
2020-10-26 10:13:20 +01:00
Max Reitz
bf109c6404 fuse: implement crossmounts
FUSE servers can indicate crossmount points by setting FUSE_ATTR_SUBMOUNT
in fuse_attr.flags.  The inode will then be marked as S_AUTOMOUNT, and the
.d_automount implementation creates a new submount at that location, so
that the submount gets a distinct st_dev value.

Note that all submounts get a distinct superblock and a distinct st_dev
value, so for virtio-fs, even if the same filesystem is mounted more than
once on the host, none of its mount points will have the same st_dev.  We
need distinct superblocks because the superblock points to the root node,
but the different host mounts may show different trees (e.g. due to
submounts in some of them, but not in others).

Right now, this behavior is only enabled when fuse_conn.auto_submounts is
set, which is the case only for virtio-fs.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-10-09 16:33:47 +02:00
Max Reitz
1866d779d5 fuse: Allow fuse_fill_super_common() for submounts
Submounts have their own superblock, which needs to be initialized.
However, they do not have a fuse_fs_context associated with them, and
the root node's attributes should be taken from the mountpoint's node.

Extend fuse_fill_super_common() to work for submounts by making the @ctx
parameter optional, and by adding a @submount_finode parameter.

(There is a plain "unsigned" in an existing code block that is being
indented by this commit.  Extend it to "unsigned int" so checkpatch does
not complain.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-18 15:17:41 +02:00
Max Reitz
fcee216beb fuse: split fuse_mount off of fuse_conn
We want to allow submounts for the same fuse_conn, but with different
superblocks so that each of the submounts has its own device ID.  To do
so, we need to split all mount-specific information off of fuse_conn
into a new fuse_mount structure, so that multiple mounts can share a
single fuse_conn.

We need to take care only to perform connection-level actions once (i.e.
when the fuse_conn and thus the first fuse_mount are established, or
when the last fuse_mount and thus the fuse_conn are destroyed).  For
example, fuse_sb_destroy() must invoke fuse_send_destroy() until the
last superblock is released.

To do so, we keep track of which fuse_mount is the root mount and
perform all fuse_conn-level actions only when this fuse_mount is
involved.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-18 15:17:41 +02:00
Max Reitz
8f622e9497 fuse: drop fuse_conn parameter where possible
With the last commit, all functions that handle some existing fuse_req
no longer need to be given the associated fuse_conn, because they can
get it from the fuse_req object.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-18 15:17:41 +02:00
Max Reitz
24754db272 fuse: store fuse_conn in fuse_req
Every fuse_req belongs to a fuse_conn.  Right now, we always know which
fuse_conn that is based on the respective device, but we want to allow
multiple (sub)mounts per single connection, and then the corresponding
filesystem is not going to be so trivial to obtain.

Storing a pointer to the associated fuse_conn in every fuse_req will
allow us to trivially find any request's superblock (and thus
filesystem) even then.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-18 15:17:40 +02:00
Vivek Goyal
9a752d18c8 virtiofs: add logic to free up a memory range
Add logic to free up a busy memory range. Freed memory range will be
returned to free pool. Add a worker which can be started to select
and free some busy memory ranges.

Process can also steal one of its busy dax ranges if free range is not
available. I will refer it to as direct reclaim.

If free range is not available and nothing can't be stolen from same
inode, caller waits on a waitq for free range to become available.

For reclaiming a range, as of now we need to hold following locks in
specified order.

	down_write(&fi->i_mmap_sem);
	down_write(&fi->dax->sem);

We look for a free range in following order.

A. Try to get a free range.
B. If not, try direct reclaim.
C. If not, wait for a memory range to become free

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-10 11:39:23 +02:00
Vivek Goyal
6ae330cad6 virtiofs: serialize truncate/punch_hole and dax fault path
Currently in fuse we don't seem have any lock which can serialize fault
path with truncate/punch_hole path. With dax support I need one for
following reasons.

1. Dax requirement

  DAX fault code relies on inode size being stable for the duration of
  fault and want to serialize with truncate/punch_hole and they explicitly
  mention it.

  static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
                               const struct iomap_ops *ops)
        /*
         * Check whether offset isn't beyond end of file now. Caller is
         * supposed to hold locks serializing us with truncate / punch hole so
         * this is a reliable test.
         */
        max_pgoff = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);

2. Make sure there are no users of pages being truncated/punch_hole

  get_user_pages() might take references to page and then do some DMA
  to said pages. Filesystem might truncate those pages without knowing
  that a DMA is in progress or some I/O is in progress. So use
  dax_layout_busy_page() to make sure there are no such references
  and I/O is not in progress on said pages before moving ahead with
  truncation.

3. Limitation of kvm page fault error reporting

  If we are truncating file on host first and then removing mappings in
  guest lateter (truncate page cache etc), then this could lead to a
  problem with KVM. Say a mapping is in place in guest and truncation
  happens on host. Now if guest accesses that mapping, then host will
  take a fault and kvm will either exit to qemu or spin infinitely.

  IOW, before we do truncation on host, we need to make sure that guest
  inode does not have any mapping in that region or whole file.

4. virtiofs memory range reclaim

 Soon I will introduce the notion of being able to reclaim dax memory
 ranges from a fuse dax inode. There also I need to make sure that
 no I/O or fault is going on in the reclaimed range and nobody is using
 it so that range can be reclaimed without issues.

Currently if we take inode lock, that serializes read/write. But it does
not do anything for faults. So I add another semaphore fuse_inode->i_mmap_sem
for this purpose.  It can be used to serialize with faults.

As of now, I am adding taking this semaphore only in dax fault path and
not regular fault path because existing code does not have one. May
be existing code can benefit from it as well to take care of some
races, but that we can fix later if need be. For now, I am just focussing
only on DAX path which is new path.

Also added logic to take fuse_inode->i_mmap_sem in
truncate/punch_hole/open(O_TRUNC) path to make sure file truncation and
fuse dax fault are mutually exlusive and avoid all the above problems.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-10 11:39:23 +02:00
Vivek Goyal
c2d0ad00d9 virtiofs: implement dax read/write operations
This patch implements basic DAX support. mmap() is not implemented
yet and will come in later patches. This patch looks into implemeting
read/write.

We make use of interval tree to keep track of per inode dax mappings.

Do not use dax for file extending writes, instead just send WRITE message
to daemon (like we do for direct I/O path). This will keep write and
i_size change atomic w.r.t crash.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-10 11:39:23 +02:00
Stefan Hajnoczi
fd1a1dc6f5 virtiofs: implement FUSE_INIT map_alignment field
The device communicates FUSE_SETUPMAPPING/FUSE_REMOVMAPPING alignment
constraints via the FUST_INIT map_alignment field.  Parse this field and
ensure our DAX mappings meet the alignment constraints.

We don't actually align anything differently since our mappings are
already 2MB aligned.  Just check the value when the connection is
established.  If it becomes necessary to honor arbitrary alignments in
the future we'll have to adjust how mappings are sized.

The upshot of this commit is that we can be confident that mappings will
work even when emulating x86 on Power and similar combinations where the
host page sizes are different.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-10 11:39:22 +02:00
Vivek Goyal
1dd539577c virtiofs: add a mount option to enable dax
Add a mount option to allow using dax with virtio_fs.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-10 11:39:22 +02:00
Vivek Goyal
f4fd4ae354 virtiofs: get rid of no_mount_options
This option was introduced so that for virtio_fs we don't show any mounts
options fuse_show_options(). Because we don't offer any of these options
to be controlled by mounter.

Very soon we are planning to introduce option "dax" which mounter should
be able to specify. And no_mount_options does not work anymore.


Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-10 11:39:22 +02:00
Alistair Delva
4c090c8696 ANDROID: GKI: Don't compact fuse_req when CONFIG_VIRTIO_FS=n
Otherwise we cannot enable VIRTIO_FS downstream.

Bug: 161843089
Change-Id: I317b8c425ab96a1bd484b85b41ce3cb036327117
Signed-off-by: Alistair Delva <adelva@google.com>
2020-07-26 15:00:00 +00:00
Maxim Patlasov
6b2fb79963 fuse: optimize writepages search
Re-work fi->writepages, replacing list with rb-tree.  This improves
performance because kernel fuse iterates through fi->writepages for each
writeback page and typical number of entries is about 800 (for 100MB of
fuse writeback).

Before patch:

10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 41.3473 s, 260 MB/s

 2  1      0 57445400  40416 6323676    0    0    33 374743 8633 19210  1  8 88  3  0

  29.86%  [kernel]               [k] _raw_spin_lock
  26.62%  [fuse]                 [k] fuse_page_is_writeback

After patch:

10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 21.4954 s, 500 MB/s

 2  9      0 53676040  31744 10265984    0    0    64 854790 10956 48387  1  6 88  6  0

  23.55%  [kernel]             [k] copy_user_enhanced_fast_string
   9.87%  [kernel]             [k] __memcpy
   3.10%  [kernel]             [k] _raw_spin_lock

Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-19 14:50:38 +02:00
Vivek Goyal
bb737bbe48 virtiofs: schedule blocking async replies in separate worker
In virtiofs (unlike in regular fuse) processing of async replies is
serialized.  This can result in a deadlock in rare corner cases when
there's a circular dependency between the completion of two or more async
replies.

Such a deadlock can be reproduced with xfstests:generic/503 if TEST_DIR ==
SCRATCH_MNT (which is a misconfiguration):

 - Process A is waiting for page lock in worker thread context and blocked
   (virtio_fs_requests_done_work()).
 - Process B is holding page lock and waiting for pending writes to
   finish (fuse_wait_on_page_writeback()).
 - Write requests are waiting in virtqueue and can't complete because
   worker thread is blocked on page lock (process A).

Fix this by creating a unique work_struct for each async reply that can
block (O_DIRECT read).

Fixes: a62a8ef9d9 ("virtio-fs: add virtiofs filesystem")
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-04-20 17:01:34 +02:00
Miklos Szeredi
3e8cb8b2ea fuse: fix stack use after return
Normal, synchronous requests will have their args allocated on the stack.
After the FR_FINISHED bit is set by receiving the reply from the userspace
fuse server, the originating task may return and reuse the stack frame,
resulting in an Oops if the args structure is dereferenced.

Fix by setting a flag in the request itself upon initializing, indicating
whether it has an asynchronous ->end() callback.

Reported-by: Kyle Sanderson <kyle.leet@gmail.com>
Reported-by: Michael Stapelberg <michael+lkml@stapelberg.ch>
Fixes: 2b319d1f6f ("fuse: don't dereference req->args on finished request")
Cc: <stable@vger.kernel.org> # v5.4
Tested-by: Michael Stapelberg <michael+lkml@stapelberg.ch>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-02-13 09:16:07 +01:00
Miklos Szeredi
eb59bd17d2 fuse: verify attributes
If a filesystem returns negative inode sizes, future reads on the file were
causing the cpu to spin on truncate_pagecache.

Create a helper to validate the attributes.  This now does two things:

 - check the file mode
 - check if the file size fits in i_size without overflowing

Reported-by: Arijit Banerjee <arijit@rubrik.com>
Fixes: d8a5ba4545 ("[PATCH] FUSE - core")
Cc: <stable@vger.kernel.org> # v2.6.14
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-11-12 11:49:04 +01:00
Miklos Szeredi
3f22c74671 virtio-fs: don't show mount options
Virtio-fs does not accept any mount options, so it's confusing and wrong to
show any in /proc/mounts.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com> 
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-15 16:11:41 +02:00
Stefan Hajnoczi
a62a8ef9d9 virtio-fs: add virtiofs filesystem
Add a basic file system module for virtio-fs.  This does not yet contain
shared data support between host and guest or metadata coherency speedups.
However it is already significantly faster than virtio-9p.

Design Overview
===============

With the goal of designing something with better performance and local file
system semantics, a bunch of ideas were proposed.

 - Use fuse protocol (instead of 9p) for communication between guest and
   host.  Guest kernel will be fuse client and a fuse server will run on
   host to serve the requests.

 - For data access inside guest, mmap portion of file in QEMU address space
   and guest accesses this memory using dax.  That way guest page cache is
   bypassed and there is only one copy of data (on host).  This will also
   enable mmap(MAP_SHARED) between guests.

 - For metadata coherency, there is a shared memory region which contains
   version number associated with metadata and any guest changing metadata
   updates version number and other guests refresh metadata on next access.
   This is yet to be implemented.

How virtio-fs differs from existing approaches
==============================================

The unique idea behind virtio-fs is to take advantage of the co-location of
the virtual machine and hypervisor to avoid communication (vmexits).

DAX allows file contents to be accessed without communication with the
hypervisor.  The shared memory region for metadata avoids communication in
the common case where metadata is unchanged.

By replacing expensive communication with cheaper shared memory accesses,
we expect to achieve better performance than approaches based on network
file system protocols.  In addition, this also makes it easier to achieve
local file system semantics (coherency).

These techniques are not applicable to network file system protocols since
the communications channel is bypassed by taking advantage of shared memory
on a local machine.  This is why we decided to build virtio-fs rather than
focus on 9P or NFS.

Caching Modes
=============

Like virtio-9p, different caching modes are supported which determine the
coherency level as well.  The “cache=FOO” and “writeback” options control
the level of coherence between the guest and host filesystems.

 - cache=none
   metadata, data and pathname lookup are not cached in guest.  They are
   always fetched from host and any changes are immediately pushed to host.

 - cache=always
   metadata, data and pathname lookup are cached in guest and never expire.

 - cache=auto
   metadata and pathname lookup cache expires after a configured amount of
   time (default is 1 second).  Data is cached while the file is open
   (close to open consistency).

 - writeback/no_writeback
   These options control the writeback strategy.  If writeback is disabled,
   then normal writes will immediately be synchronized with the host fs.
   If writeback is enabled, then writes may be cached in the guest until
   the file is closed or an fsync(2) performed.  This option has no effect
   on mmap-ed writes or writes going through the DAX mechanism.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-18 20:17:50 +02:00
Vivek Goyal
15c8e72e88 fuse: allow skipping control interface and forced unmount
virtio-fs does not support aborting requests which are being
processed. That is requests which have been sent to fuse daemon on host.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:41 +02:00
Miklos Szeredi
783863d647 fuse: dissociate DESTROY from fuseblk
Allow virtio-fs to also send DESTROY request.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:41 +02:00
Miklos Szeredi
8fab010644 fuse: delete dentry if timeout is zero
Don't hold onto dentry in lru list if need to re-lookup it anyway at next
access.  Only do this if explicitly enabled, otherwise it could result in
performance regression.

More advanced version of this patch would periodically flush out dentries
from the lru which have gone stale.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:41 +02:00
Vivek Goyal
0cd1eb9a41 fuse: separate fuse device allocation and installation in fuse_conn
As of now fuse_dev_alloc() both allocates a fuse device and installs it in
fuse_conn list.  fuse_dev_alloc() can fail if fuse_device allocation fails.

virtio-fs needs to initialize multiple fuse devices (one per virtio queue).
It initializes one fuse device as part of call to fuse_fill_super_common()
and rest of the devices are allocated and installed after that.

But, we can't afford to fail after calling fuse_fill_super_common() as we
don't have a way to undo all the actions done by fuse_fill_super_common().
So to avoid failures after the call to fuse_fill_super_common(),
pre-allocate all fuse devices early and install them into fuse connection
later.

This patch provides two separate helpers for fuse device allocation and
fuse device installation in fuse_conn.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:41 +02:00
Stefan Hajnoczi
ae3aad77f4 fuse: add fuse_iqueue_ops callbacks
The /dev/fuse device uses fiq->waitq and fasync to signal that requests are
available.  These mechanisms do not apply to virtio-fs.  This patch
introduces callbacks so alternative behavior can be used.

Note that queue_interrupt() changes along these lines:

  spin_lock(&fiq->waitq.lock);
  wake_up_locked(&fiq->waitq);
+ kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
  spin_unlock(&fiq->waitq.lock);
- kill_fasync(&fiq->fasync, SIGIO, POLL_IN);

Since queue_request() and queue_forget() also call kill_fasync() inside
the spinlock this should be safe.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:41 +02:00
Stefan Hajnoczi
0cc2656cdb fuse: extract fuse_fill_super_common()
fuse_fill_super() includes code to process the fd= option and link the
struct fuse_dev to the fd's struct file.  In virtio-fs there is no file
descriptor because /dev/fuse is not used.

This patch extracts fuse_fill_super_common() so that both classic fuse and
virtio-fs can share the code to initialize a mount.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:40 +02:00
Vivek Goyal
4388c5aac4 fuse: export fuse_dequeue_forget() function
File systems like virtio-fs need to do not have to play directly with
forget list data structures. There is a helper function use that instead.

Rename dequeue_forget() to fuse_dequeue_forget() and export it so that
stacked filesystems can use it.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:40 +02:00
Stefan Hajnoczi
79d96efffd fuse: export fuse_get_unique()
virtio-fs will need unique IDs for FORGET requests from outside
fs/fuse/dev.c.  Make the symbol visible.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:40 +02:00