Merge tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe: - Two NVMe pull requests: - ana log parse fix from Anton - nvme quirks support for Apple devices from Ben - fix missing bio completion tracing for multipath stack devices from Hannes and Mikhail - IP TOS settings for nvme rdma and tcp transports from Israel - rq_dma_dir cleanups from Israel - tracing for Get LBA Status command from Minwoo - Some nvme-tcp cleanups from Minwoo, Potnuri and Myself - Some consolidation between the fabrics transports for handling the CAP register - reset race with ns scanning fix for fabrics (move fabrics commands to a dedicated request queue with a different lifetime from the admin request queue)." - controller reset and namespace scan races fixes - nvme discovery log change uevent support - naming improvements from Keith - multiple discovery controllers reject fix from James - some regular cleanups from various people - Series fixing (and re-fixing) null_blk debug printing and nr_devices checks (André) - A few pull requests from Song, with fixes from Andy, Guoqing, Guilherme, Neil, Nigel, and Yufen. - REQ_OP_ZONE_RESET_ALL support (Chaitanya) - Bio merge handling unification (Christoph) - Pick default elevator correctly for devices with special needs (Damien) - Block stats fixes (Hou) - Timeout and support devices nbd fixes (Mike) - Series fixing races around elevator switching and device add/remove (Ming) - sed-opal cleanups (Revanth) - Per device weight support for BFQ (Fam) - Support for blk-iocost, a new model that can properly account cost of IO workloads. (Tejun) - blk-cgroup writeback fixes (Tejun) - paride queue init fixes (zhengbin) - blk_set_runtime_active() cleanup (Stanley) - Block segment mapping optimizations (Bart) - lightnvm fixes (Hans/Minwoo/YueHaibing) - Various little fixes and cleanups * tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block: (186 commits) null_blk: format pr_* logs with pr_fmt null_blk: match the type of parameter nr_devices null_blk: do not fail the module load with zero devices block: also check RQF_STATS in blk_mq_need_time_stamp() block: make rq sector size accessible for block stats bfq: Fix bfq linkage error raid5: use bio_end_sector in r5_next_bio raid5: remove STRIPE_OPS_REQ_PENDING md: add feature flag MD_FEATURE_RAID0_LAYOUT md/raid0: avoid RAID0 data corruption due to layout confusion. raid5: don't set STRIPE_HANDLE to stripe which is in batch list raid5: don't increment read_errors on EILSEQ return nvmet: fix a wrong error status returned in error log page nvme: send discovery log page change events to userspace nvme: add uevent variables for controller devices nvme: enable aen regardless of the presence of I/O queues nvme-fabrics: allow discovery subsystems accept a kato nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery() nvme: Remove redundant assignment of cq vector nvme: Assign subsys instance from first ctrl ...
This commit is contained in:
@@ -1469,6 +1469,103 @@ IO Interface Files
|
||||
8:16 rbytes=1459200 wbytes=314773504 rios=192 wios=353 dbytes=0 dios=0
|
||||
8:0 rbytes=90430464 wbytes=299008000 rios=8950 wios=1252 dbytes=50331648 dios=3021
|
||||
|
||||
io.cost.qos
|
||||
A read-write nested-keyed file with exists only on the root
|
||||
cgroup.
|
||||
|
||||
This file configures the Quality of Service of the IO cost
|
||||
model based controller (CONFIG_BLK_CGROUP_IOCOST) which
|
||||
currently implements "io.weight" proportional control. Lines
|
||||
are keyed by $MAJ:$MIN device numbers and not ordered. The
|
||||
line for a given device is populated on the first write for
|
||||
the device on "io.cost.qos" or "io.cost.model". The following
|
||||
nested keys are defined.
|
||||
|
||||
====== =====================================
|
||||
enable Weight-based control enable
|
||||
ctrl "auto" or "user"
|
||||
rpct Read latency percentile [0, 100]
|
||||
rlat Read latency threshold
|
||||
wpct Write latency percentile [0, 100]
|
||||
wlat Write latency threshold
|
||||
min Minimum scaling percentage [1, 10000]
|
||||
max Maximum scaling percentage [1, 10000]
|
||||
====== =====================================
|
||||
|
||||
The controller is disabled by default and can be enabled by
|
||||
setting "enable" to 1. "rpct" and "wpct" parameters default
|
||||
to zero and the controller uses internal device saturation
|
||||
state to adjust the overall IO rate between "min" and "max".
|
||||
|
||||
When a better control quality is needed, latency QoS
|
||||
parameters can be configured. For example::
|
||||
|
||||
8:16 enable=1 ctrl=auto rpct=95.00 rlat=75000 wpct=95.00 wlat=150000 min=50.00 max=150.0
|
||||
|
||||
shows that on sdb, the controller is enabled, will consider
|
||||
the device saturated if the 95th percentile of read completion
|
||||
latencies is above 75ms or write 150ms, and adjust the overall
|
||||
IO issue rate between 50% and 150% accordingly.
|
||||
|
||||
The lower the saturation point, the better the latency QoS at
|
||||
the cost of aggregate bandwidth. The narrower the allowed
|
||||
adjustment range between "min" and "max", the more conformant
|
||||
to the cost model the IO behavior. Note that the IO issue
|
||||
base rate may be far off from 100% and setting "min" and "max"
|
||||
blindly can lead to a significant loss of device capacity or
|
||||
control quality. "min" and "max" are useful for regulating
|
||||
devices which show wide temporary behavior changes - e.g. a
|
||||
ssd which accepts writes at the line speed for a while and
|
||||
then completely stalls for multiple seconds.
|
||||
|
||||
When "ctrl" is "auto", the parameters are controlled by the
|
||||
kernel and may change automatically. Setting "ctrl" to "user"
|
||||
or setting any of the percentile and latency parameters puts
|
||||
it into "user" mode and disables the automatic changes. The
|
||||
automatic mode can be restored by setting "ctrl" to "auto".
|
||||
|
||||
io.cost.model
|
||||
A read-write nested-keyed file with exists only on the root
|
||||
cgroup.
|
||||
|
||||
This file configures the cost model of the IO cost model based
|
||||
controller (CONFIG_BLK_CGROUP_IOCOST) which currently
|
||||
implements "io.weight" proportional control. Lines are keyed
|
||||
by $MAJ:$MIN device numbers and not ordered. The line for a
|
||||
given device is populated on the first write for the device on
|
||||
"io.cost.qos" or "io.cost.model". The following nested keys
|
||||
are defined.
|
||||
|
||||
===== ================================
|
||||
ctrl "auto" or "user"
|
||||
model The cost model in use - "linear"
|
||||
===== ================================
|
||||
|
||||
When "ctrl" is "auto", the kernel may change all parameters
|
||||
dynamically. When "ctrl" is set to "user" or any other
|
||||
parameters are written to, "ctrl" become "user" and the
|
||||
automatic changes are disabled.
|
||||
|
||||
When "model" is "linear", the following model parameters are
|
||||
defined.
|
||||
|
||||
============= ========================================
|
||||
[r|w]bps The maximum sequential IO throughput
|
||||
[r|w]seqiops The maximum 4k sequential IOs per second
|
||||
[r|w]randiops The maximum 4k random IOs per second
|
||||
============= ========================================
|
||||
|
||||
From the above, the builtin linear model determines the base
|
||||
costs of a sequential and random IO and the cost coefficient
|
||||
for the IO size. While simple, this model can cover most
|
||||
common device classes acceptably.
|
||||
|
||||
The IO cost model isn't expected to be accurate in absolute
|
||||
sense and is scaled to the device behavior dynamically.
|
||||
|
||||
If needed, tools/cgroup/iocost_coef_gen.py can be used to
|
||||
generate device-specific coefficients.
|
||||
|
||||
io.weight
|
||||
A read-write flat-keyed file which exists on non-root cgroups.
|
||||
The default is "default 100".
|
||||
|
@@ -1201,12 +1201,6 @@
|
||||
See comment before function elanfreq_setup() in
|
||||
arch/x86/kernel/cpu/cpufreq/elanfreq.c.
|
||||
|
||||
elevator= [IOSCHED]
|
||||
Format: { "mq-deadline" | "kyber" | "bfq" }
|
||||
See Documentation/block/deadline-iosched.rst,
|
||||
Documentation/block/kyber-iosched.rst and
|
||||
Documentation/block/bfq-iosched.rst for details.
|
||||
|
||||
elfcorehdr=[size[KMG]@]offset[KMG] [IA64,PPC,SH,X86,S390]
|
||||
Specifies physical address of start of kernel core
|
||||
image elf header and optionally the size. Generally
|
||||
|
@@ -274,9 +274,7 @@ To reduce its OS jitter, do any of the following:
|
||||
(based on an earlier one from Gilad Ben-Yossef) that
|
||||
reduces or even eliminates vmstat overhead for some
|
||||
workloads at https://lkml.org/lkml/2013/9/4/379.
|
||||
e. Boot with "elevator=noop" to avoid workqueue use by
|
||||
the block layer.
|
||||
f. If running on high-end powerpc servers, build with
|
||||
e. If running on high-end powerpc servers, build with
|
||||
CONFIG_PPC_RTAS_DAEMON=n. This prevents the RTAS
|
||||
daemon from running on each CPU every second or so.
|
||||
(This will require editing Kconfig files and will defeat
|
||||
@@ -284,12 +282,12 @@ To reduce its OS jitter, do any of the following:
|
||||
due to the rtas_event_scan() function.
|
||||
WARNING: Please check your CPU specifications to
|
||||
make sure that this is safe on your particular system.
|
||||
g. If running on Cell Processor, build your kernel with
|
||||
f. If running on Cell Processor, build your kernel with
|
||||
CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from
|
||||
spu_gov_work().
|
||||
WARNING: Please check your CPU specifications to
|
||||
make sure that this is safe on your particular system.
|
||||
h. If running on PowerMAC, build your kernel with
|
||||
g. If running on PowerMAC, build your kernel with
|
||||
CONFIG_PMAC_RACKMETER=n to disable the CPU-meter,
|
||||
avoiding OS jitter from rackmeter_do_timer().
|
||||
|
||||
|
Reference in New Issue
Block a user