HWPOISON: Add soft page offline support

This is a simpler, gentler variant of memory_failure() for soft page
offlining controlled from user space.  It doesn't kill anything, just
tries to invalidate and if that doesn't work migrate the
page away.

This is useful for predictive failure analysis, where a page has
a high rate of corrected errors, but hasn't gone bad yet. Instead
it can be offlined early and avoided.

The offlining is controlled from sysfs, including a new generic
entry point for hard page offlining for symmetry too.

We use the page isolate facility to prevent re-allocation
race. Normally this is only used by memory hotplug. To avoid
races with memory allocation I am using lock_system_sleep().
This avoids the situation where memory hotplug is about
to isolate a page range and then hwpoison undoes that work.
This is a big hammer currently, but the simplest solution
currently.

When the page is not free or LRU we try to free pages
from slab and other caches. The slab freeing is currently
quite dumb and does not try to focus on the specific slab
cache which might own the page. This could be potentially
improved later.

Thanks to Fengguang Wu and Haicheng Li for some fixes.

[Added fix from Andrew Morton to adapt to new migrate_pages prototype]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
此提交包含在:
Andi Kleen
2009-12-16 12:20:00 +01:00
提交者 Andi Kleen
父節點 2326c467df
當前提交 facb6011f3
共有 5 個檔案被更改,包括 297 行新增7 行删除

查看文件

@@ -341,6 +341,64 @@ static inline int memory_probe_init(void)
}
#endif
#ifdef CONFIG_MEMORY_FAILURE
/*
* Support for offlining pages of memory
*/
/* Soft offline a page */
static ssize_t
store_soft_offline_page(struct class *class, const char *buf, size_t count)
{
int ret;
u64 pfn;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
if (strict_strtoull(buf, 0, &pfn) < 0)
return -EINVAL;
pfn >>= PAGE_SHIFT;
if (!pfn_valid(pfn))
return -ENXIO;
ret = soft_offline_page(pfn_to_page(pfn), 0);
return ret == 0 ? count : ret;
}
/* Forcibly offline a page, including killing processes. */
static ssize_t
store_hard_offline_page(struct class *class, const char *buf, size_t count)
{
int ret;
u64 pfn;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
if (strict_strtoull(buf, 0, &pfn) < 0)
return -EINVAL;
pfn >>= PAGE_SHIFT;
ret = __memory_failure(pfn, 0, 0);
return ret ? ret : count;
}
static CLASS_ATTR(soft_offline_page, 0644, NULL, store_soft_offline_page);
static CLASS_ATTR(hard_offline_page, 0644, NULL, store_hard_offline_page);
static __init int memory_fail_init(void)
{
int err;
err = sysfs_create_file(&memory_sysdev_class.kset.kobj,
&class_attr_soft_offline_page.attr);
if (!err)
err = sysfs_create_file(&memory_sysdev_class.kset.kobj,
&class_attr_hard_offline_page.attr);
return err;
}
#else
static inline int memory_fail_init(void)
{
return 0;
}
#endif
/*
* Note that phys_device is optional. It is here to allow for
* differentiation between which *physical* devices each
@@ -471,6 +529,9 @@ int __init memory_dev_init(void)
}
err = memory_probe_init();
if (!ret)
ret = err;
err = memory_fail_init();
if (!ret)
ret = err;
err = block_size_init();