raid5-ppl: Partial Parity Log write logging implementation
Implement the calculation of partial parity for a stripe and PPL write logging functionality. The description of PPL is added to the documentation. More details can be found in the comments in raid5-ppl.c. Attach a page for holding the partial parity data to stripe_head. Allocate it only if mddev has the MD_HAS_PPL flag set. Partial parity is the xor of not modified data chunks of a stripe and is calculated as follows: - reconstruct-write case: xor data from all not updated disks in a stripe - read-modify-write case: xor old data and parity from all updated disks in a stripe Implement it using the async_tx API and integrate into raid_run_ops(). It must be called when we still have access to old data, so do it when STRIPE_OP_BIODRAIN is set, but before ops_run_prexor5(). The result is stored into sh->ppl_page. Partial parity is not meaningful for full stripe write and is not stored in the log or used for recovery, so don't attempt to calculate it when stripe has STRIPE_FULL_WRITE. Put the PPL metadata structures to md_p.h because userspace tools (mdadm) will also need to read/write PPL. Warn about using PPL with enabled disk volatile write-back cache for now. It can be removed once disk cache flushing before writing PPL is implemented. Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
This commit is contained in:

committed by
Shaohua Li

parent
ff875738ed
commit
3418d036c8
44
Documentation/md/raid5-ppl.txt
Normal file
44
Documentation/md/raid5-ppl.txt
Normal file
@@ -0,0 +1,44 @@
|
||||
Partial Parity Log
|
||||
|
||||
Partial Parity Log (PPL) is a feature available for RAID5 arrays. The issue
|
||||
addressed by PPL is that after a dirty shutdown, parity of a particular stripe
|
||||
may become inconsistent with data on other member disks. If the array is also
|
||||
in degraded state, there is no way to recalculate parity, because one of the
|
||||
disks is missing. This can lead to silent data corruption when rebuilding the
|
||||
array or using it is as degraded - data calculated from parity for array blocks
|
||||
that have not been touched by a write request during the unclean shutdown can
|
||||
be incorrect. Such condition is known as the RAID5 Write Hole. Because of
|
||||
this, md by default does not allow starting a dirty degraded array.
|
||||
|
||||
Partial parity for a write operation is the XOR of stripe data chunks not
|
||||
modified by this write. It is just enough data needed for recovering from the
|
||||
write hole. XORing partial parity with the modified chunks produces parity for
|
||||
the stripe, consistent with its state before the write operation, regardless of
|
||||
which chunk writes have completed. If one of the not modified data disks of
|
||||
this stripe is missing, this updated parity can be used to recover its
|
||||
contents. PPL recovery is also performed when starting an array after an
|
||||
unclean shutdown and all disks are available, eliminating the need to resync
|
||||
the array. Because of this, using write-intent bitmap and PPL together is not
|
||||
supported.
|
||||
|
||||
When handling a write request PPL writes partial parity before new data and
|
||||
parity are dispatched to disks. PPL is a distributed log - it is stored on
|
||||
array member drives in the metadata area, on the parity drive of a particular
|
||||
stripe. It does not require a dedicated journaling drive. Write performance is
|
||||
reduced by up to 30%-40% but it scales with the number of drives in the array
|
||||
and the journaling drive does not become a bottleneck or a single point of
|
||||
failure.
|
||||
|
||||
Unlike raid5-cache, the other solution in md for closing the write hole, PPL is
|
||||
not a true journal. It does not protect from losing in-flight data, only from
|
||||
silent data corruption. If a dirty disk of a stripe is lost, no PPL recovery is
|
||||
performed for this stripe (parity is not updated). So it is possible to have
|
||||
arbitrary data in the written part of a stripe if that disk is lost. In such
|
||||
case the behavior is the same as in plain raid5.
|
||||
|
||||
PPL is available for md version-1 metadata and external (specifically IMSM)
|
||||
metadata arrays. It can be enabled using mdadm option --consistency-policy=ppl.
|
||||
|
||||
Currently, volatile write-back cache should be disabled on all member drives
|
||||
when using PPL. Otherwise it cannot guarantee consistency in case of power
|
||||
failure.
|
Reference in New Issue
Block a user