Make sure to free and deregister the addrmatch and chancounts devices
allocated during probe in all error paths. Also fix use-after-free in a
probe error path and in the remove success path where the devices were
being put before before deregistration.
Signed-off-by: Johan Hovold <johan@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 356f0a3086 ("i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy")
Link: http://lkml.kernel.org/r/20180612124335.6420-2-johan@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
The kbuild test robot reported the following warning:
drivers/edac/altera_edac.c: In function 'ocram_free_mem':
drivers/edac/altera_edac.c:1410:42: warning: cast from pointer to integer
of different size [-Wpointer-to-int-cast]
gen_pool_free((struct gen_pool *)other, (u32)p, size);
^
After adding support for ARM64 architectures, the unsigned long
parameter is 64 bits and causes a build warning on 64-bit configs. Fix
by casting to the correct size (unsigned long) instead of u32.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: c3eea1942a ("EDAC, altera: Add Altera L2 cache and OCRAM support")
Link: http://lkml.kernel.org/r/1526317441-4996-1-git-send-email-thor.thayer@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
... for improved readability. Also, add a local mask variable for the
same reason.
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Tony reported seeing
"Internal error: Can't find EDAC structure"
when injecting correctable errors due to the fact that ghes_edac would
still load even if the whitelist won't hit. Drop the pr_err() in
ghes_edac_report_mem_error() for now due to the hacky way how ghes_edac
depends on ghes.c.
While at it, make ghes_edac_register() return an error if it doesn't hit
in the whitelist as it is the only sensible thing to do in that
situation.
Furthermore, move the call to it to happen last in ghes_probe() so that
GHES initializing properly does not depend on ghes_edac init at all
as latter is only reporting errors and not required for GHES's proper
functioning.
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Tested-by: Sughosh Ganu <sughosh.ganu@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20180420182015.zao3olss4tvvlxki@agluck-desk
Pull EDAC updates from Borislav Petkov:
"Noteworthy is the NVDIMM support:
- NVDIMM support to EDAC (Tony Luck)
- misc fixes"
* tag 'edac_for_4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, sb_edac: Remove variable length array usage
EDAC, skx_edac: Detect non-volatile DIMMs
firmware, DMI: Add function to look up a handle and return DIMM size
acpi, nfit: Add function to look up nvdimm device and provide SMBIOS handle
EDAC: Add new memory type for non-volatile DIMMs
EDAC: Drop duplicated array of strings for memory type names
EDAC, layerscape: Allow building for LS1021A
Pul removal of obsolete architecture ports from Arnd Bergmann:
"This removes the entire architecture code for blackfin, cris, frv,
m32r, metag, mn10300, score, and tile, including the associated device
drivers.
I have been working with the (former) maintainers for each one to
ensure that my interpretation was right and the code is definitely
unused in mainline kernels. Many had fond memories of working on the
respective ports to start with and getting them included in upstream,
but also saw no point in keeping the port alive without any users.
In the end, it seems that while the eight architectures are extremely
different, they all suffered the same fate: There was one company in
charge of an SoC line, a CPU microarchitecture and a software
ecosystem, which was more costly than licensing newer off-the-shelf
CPU cores from a third party (typically ARM, MIPS, or RISC-V). It
seems that all the SoC product lines are still around, but have not
used the custom CPU architectures for several years at this point. In
contrast, CPU instruction sets that remain popular and have actively
maintained kernel ports tend to all be used across multiple licensees.
[ See the new nds32 port merged in the previous commit for the next
generation of "one company in charge of an SoC line, a CPU
microarchitecture and a software ecosystem" - Linus ]
The removal came out of a discussion that is now documented at
https://lwn.net/Articles/748074/. Unlike the original plans, I'm not
marking any ports as deprecated but remove them all at once after I
made sure that they are all unused. Some architectures (notably tile,
mn10300, and blackfin) are still being shipped in products with old
kernels, but those products will never be updated to newer kernel
releases.
After this series, we still have a few architectures without mainline
gcc support:
- unicore32 and hexagon both have very outdated gcc releases, but the
maintainers promised to work on providing something newer. At least
in case of hexagon, this will only be llvm, not gcc.
- openrisc, risc-v and nds32 are still in the process of finishing
their support or getting it added to mainline gcc in the first
place. They all have patched gcc-7.3 ports that work to some
degree, but complete upstream support won't happen before gcc-8.1.
Csky posted their first kernel patch set last week, their situation
will be similar
[ Palmer Dabbelt points out that RISC-V support is in mainline gcc
since gcc-7, although gcc-7.3.0 is the recommended minimum - Linus ]"
This really says it all:
2498 files changed, 95 insertions(+), 467668 deletions(-)
* tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (74 commits)
MAINTAINERS: UNICORE32: Change email account
staging: iio: remove iio-trig-bfin-timer driver
tty: hvc: remove tile driver
tty: remove bfin_jtag_comm and hvc_bfin_jtag drivers
serial: remove tile uart driver
serial: remove m32r_sio driver
serial: remove blackfin drivers
serial: remove cris/etrax uart drivers
usb: Remove Blackfin references in USB support
usb: isp1362: remove blackfin arch glue
usb: musb: remove blackfin port
usb: host: remove tilegx platform glue
pwm: remove pwm-bfin driver
i2c: remove bfin-twi driver
spi: remove blackfin related host drivers
watchdog: remove bfin_wdt driver
can: remove bfin_can driver
mmc: remove bfin_sdh driver
input: misc: remove blackfin rotary driver
input: keyboard: remove bf54x driver
...
The Tile architecture is obsolete and getting removed from the kernel,
this driver appears to only be used there, and not on the ARM based
successors (Tile-Mx, BlueField), so we should remove it as well.
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Currently, bank 4 is reserved on Fam17h, so we chose not to initialize
bank 4 in the smca_banks array. This means that when we check if a bank
is initialized, like during boot or resume, we will see that bank 4 is
not initialized and try to initialize it.
This will cause a call trace, when resuming from suspend, due to
rdmsr_*on_cpu() calls in the init path. The rdmsr_*on_cpu() calls issue
an IPI but we're running with interrupts disabled. This triggers:
WARNING: CPU: 0 PID: 11523 at kernel/smp.c:291 smp_call_function_single+0xdc/0xe0
...
Reserved banks will be read-as-zero, so their MCA_IPID register will be
zero. So, like the smca_banks array, the threshold_banks array will not
have an entry for a reserved bank since all its MCA_MISC* registers will
be zero.
Enumerate a "Reserved" bank type that matches on a HWID_MCATYPE of 0,0.
Use the "Reserved" type when checking if a bank is reserved. It's
possible that other bank numbers may be reserved on future systems.
Don't try to find the block address on reserved banks.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.14.x
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180221101900.10326-7-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 PTI and Spectre related fixes and updates from Ingo Molnar:
"Here's the latest set of Spectre and PTI related fixes and updates:
Spectre:
- Add entry code register clearing to reduce the Spectre attack
surface
- Update the Spectre microcode blacklist
- Inline the KVM Spectre helpers to get close to v4.14 performance
again.
- Fix indirect_branch_prediction_barrier()
- Fix/improve Spectre related kernel messages
- Fix array_index_nospec_mask() asm constraint
- KVM: fix two MSR handling bugs
PTI:
- Fix a paranoid entry PTI CR3 handling bug
- Fix comments
objtool:
- Fix paranoid_entry() frame pointer warning
- Annotate WARN()-related UD2 as reachable
- Various fixes
- Add Add Peter Zijlstra as objtool co-maintainer
Misc:
- Various x86 entry code self-test fixes
- Improve/simplify entry code stack frame generation and handling
after recent heavy-handed PTI and Spectre changes. (There's two
more WIP improvements expected here.)
- Type fix for cache entries
There's also some low risk non-fix changes I've included in this
branch to reduce backporting conflicts:
- rename a confusing x86_cpu field name
- de-obfuscate the naming of single-TLB flushing primitives"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
x86/entry/64: Fix CR3 restore in paranoid_exit()
x86/cpu: Change type of x86_cache_size variable to unsigned int
x86/spectre: Fix an error message
x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
selftests/x86/mpx: Fix incorrect bounds with old _sigfault
x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]()
x86/speculation: Add <asm/msr-index.h> dependency
nospec: Move array_index_nospec() parameter checking into separate macro
x86/speculation: Fix up array_index_nospec_mask() asm constraint
x86/debug: Use UD2 for WARN()
x86/debug, objtool: Annotate WARN()-related UD2 as reachable
objtool: Fix segfault in ignore_unreachable_insn()
selftests/x86: Disable tests requiring 32-bit support on pure 64-bit systems
selftests/x86: Do not rely on "int $0x80" in single_step_syscall.c
selftests/x86: Do not rely on "int $0x80" in test_mremap_vdso.c
selftests/x86: Fix build bug caused by the 5lvl test which has been moved to the VM directory
selftests/x86/pkeys: Remove unused functions
selftests/x86: Clean up and document sscanf() usage
selftests/x86: Fix vDSO selftest segfault for vsyscall=none
x86/entry/64: Remove the unused 'icebp' macro
...
Fix an uninitialized variable warning in the Octeon EDAC driver, as seen
in MIPS cavium_octeon_defconfig builds since v4.14 with Codescape GNU
Tools 2016.05-03:
drivers/edac/octeon_edac-lmc.c In function ‘octeon_lmc_edac_poll_o2’:
drivers/edac/octeon_edac-lmc.c:87:24: warning: ‘((long unsigned int*)&int_reg)[1]’ may \
be used uninitialized in this function [-Wmaybe-uninitialized]
if (int_reg.s.sec_err || int_reg.s.ded_err) {
^
Iinitialise the whole int_reg variable to zero before the conditional
assignments in the error injection case.
Signed-off-by: James Hogan <jhogan@kernel.org>
Acked-by: David Daney <david.daney@cavium.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 3.15+
Fixes: 1bc021e815 ("EDAC: Octeon: Add error injection support")
Link: http://lkml.kernel.org/r/20171113161206.20990-1-james.hogan@mips.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull module updates from Jessica Yu:
"Summary of modules changes for the 4.15 merge window:
- treewide module_param_call() cleanup, fix up set/get function
prototype mismatches, from Kees Cook
- minor code cleanups"
* tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: Do not paper over type mismatches in module_param_call()
treewide: Fix function prototypes for module_param_call()
module: Prepare to convert all module_param_call() prototypes
kernel/module: Delete an error message for a failed memory allocation in add_module_usage()
Pull core locking updates from Ingo Molnar:
"The main changes in this cycle are:
- Another attempt at enabling cross-release lockdep dependency
tracking (automatically part of CONFIG_PROVE_LOCKING=y), this time
with better performance and fewer false positives. (Byungchul Park)
- Introduce lockdep_assert_irqs_enabled()/disabled() and convert
open-coded equivalents to lockdep variants. (Frederic Weisbecker)
- Add down_read_killable() and use it in the VFS's iterate_dir()
method. (Kirill Tkhai)
- Convert remaining uses of ACCESS_ONCE() to
READ_ONCE()/WRITE_ONCE(). Most of the conversion was Coccinelle
driven. (Mark Rutland, Paul E. McKenney)
- Get rid of lockless_dereference(), by strengthening Alpha atomics,
strengthening READ_ONCE() with smp_read_barrier_depends() and thus
being able to convert users of lockless_dereference() to
READ_ONCE(). (Will Deacon)
- Various micro-optimizations:
- better PV qspinlocks (Waiman Long),
- better x86 barriers (Michael S. Tsirkin)
- better x86 refcounts (Kees Cook)
- ... plus other fixes and enhancements. (Borislav Petkov, Juergen
Gross, Miguel Bernal Marin)"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE
rcu: Use lockdep to assert IRQs are disabled/enabled
netpoll: Use lockdep to assert IRQs are disabled/enabled
timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabled
sched/clock, sched/cputime: Use lockdep to assert IRQs are disabled/enabled
irq_work: Use lockdep to assert IRQs are disabled/enabled
irq/timings: Use lockdep to assert IRQs are disabled/enabled
perf/core: Use lockdep to assert IRQs are disabled/enabled
x86: Use lockdep to assert IRQs are disabled/enabled
smp/core: Use lockdep to assert IRQs are disabled/enabled
timers/hrtimer: Use lockdep to assert IRQs are disabled/enabled
timers/nohz: Use lockdep to assert IRQs are disabled/enabled
workqueue: Use lockdep to assert IRQs are disabled/enabled
irq/softirqs: Use lockdep to assert IRQs are disabled/enabled
locking/lockdep: Add IRQs disabled/enabled assertion APIs: lockdep_assert_irqs_enabled()/disabled()
locking/pvqspinlock: Implement hybrid PV queued/unfair locks
locking/rwlocks: Fix comments
x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized
block, locking/lockdep: Assign a lock_class per gendisk used for wait_for_completion()
workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes
...
Pull EDAC updates from Borislav Petkov:
"The usual pile of bugfixes, cleanups and minor driver enhancements.
Worth noting are the changes to ghes_edac to use a whitelist of
known-good platforms on which GHES error reporting works relatively
reliably. By Toshi Kani and Borislav Petkov"
* tag 'edac_for_4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, sb_edac: Fix missing break in switch
MAINTAINERS: Split Cavium EDAC entry and add myself
EDAC, sb_edac: Fix missing DIMM sysfs entries with KNL SNC2/SNC4 mode
EDAC, skx_edac: Handle systems with segmented PCI busses
EDAC, thunderx: Remove suspend/resume support
EDAC, skx_edac: Fix detection of single-rank DIMMs
EDAC, sb_edac: Don't create a second memory controller if HA1 is not present
EDAC: Add owner check to the x86 platform drivers
EDAC: Add helper which returns the loaded platform driver
EDAC, ghes: Add platform check
EDAC, ghes: Model a single, logical memory controller
EDAC, ghes: Remove symbol exports
EDAC: Handle return value of kasprintf()
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Several function prototypes for the set/get functions defined by
module_param_call() have a slightly wrong argument types. This fixes
those in an effort to clean up the calls when running under type-enforced
compiler instrumentation for CFI. This is the result of running the
following semantic patch:
@match_module_param_call_function@
declarer name module_param_call;
identifier _name, _set_func, _get_func;
expression _arg, _mode;
@@
module_param_call(_name, _set_func, _get_func, _arg, _mode);
@fix_set_prototype
depends on match_module_param_call_function@
identifier match_module_param_call_function._set_func;
identifier _val, _param;
type _val_type, _param_type;
@@
int _set_func(
-_val_type _val
+const char * _val
,
-_param_type _param
+const struct kernel_param * _param
) { ... }
@fix_get_prototype
depends on match_module_param_call_function@
identifier match_module_param_call_function._get_func;
identifier _val, _param;
type _val_type, _param_type;
@@
int _get_func(
-_val_type _val
+char * _val
,
-_param_type _param
+const struct kernel_param * _param
) { ... }
Two additional by-hand changes are included for places where the above
Coccinelle script didn't notice them:
drivers/platform/x86/thinkpad_acpi.c
fs/lockd/svc.c
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
When figuring out the size of the DIMMs and the cluster mode is SNC2 or SNC4 the
current algorithm ignores the contribution of some of the channels resulting in
EDAC never knowing of the existence of some DIMMs attached to such channels (thus
sysfs is not populated).
Instead of selectively iterating from 0 to interlv_ways when looking for all the
participants in the interleave, do an exhaustive search and iterate from 0 to
KNL_MAX_CHANNELS. The algorithm is already smart enough to consider participants
only one time.
This works fine in all KNL cluster modes and even when there are missing DIMMs
as the contribution of those channels is 0.
Signed-off-by: Luis Felipe Sandoval Castro <luis.felipe.sandoval.castro@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: arozansk@redhat.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: qiuxu.zhuo@intel.com
Link: http://lkml.kernel.org/r/1506606882-90521-1-git-send-email-luis.felipe.sandoval.castro@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Yi Zhang reported the following failure on a 2-socket Haswell (E5-2603v3)
server (DELL PowerEdge 730xd):
EDAC sbridge: Some needed devices are missing
EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0
EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0
EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Failed to register device with error -19.
The refactored sb_edac driver creates the IMC1 (the 2nd memory
controller) if any IMC1 device is present. In this case only
HA1_TA of IMC1 was present, but the driver expected to find
HA1/HA1_TM/HA1_TAD[0-3] devices too, leading to the above failure.
The document [1] says the 'E5-2603 v3' CPU has 4 memory channels max. Yi
Zhang inserted one DIMM per channel for each CPU, and did random error
address injection test with this patch:
4024 addresses fell in TOLM hole area
12715 addresses fell in CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
12774 addresses fell in CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
12798 addresses fell in CPU_SrcID#0_Ha#0_Chan#2_DIMM#0
12913 addresses fell in CPU_SrcID#0_Ha#0_Chan#3_DIMM#0
12674 addresses fell in CPU_SrcID#1_Ha#0_Chan#0_DIMM#0
12686 addresses fell in CPU_SrcID#1_Ha#0_Chan#1_DIMM#0
12882 addresses fell in CPU_SrcID#1_Ha#0_Chan#2_DIMM#0
12934 addresses fell in CPU_SrcID#1_Ha#0_Chan#3_DIMM#0
106400 addresses were injected totally.
The test result shows that all the 4 channels belong to IMC0 per CPU, so
the server really only has one IMC per CPU.
In the 1st page of chapter 2 in datasheet [2], it also says 'E5-2600 v3'
implements either one or two IMCs. For CPUs with one IMC, IMC1 is not
used and should be ignored.
Thus, do not create a second memory controller if the key HA1 is absent.
[1] http://ark.intel.com/products/83349/Intel-Xeon-Processor-E5-2603-v3-15M-Cache-1_60-GHz
[2] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
Reported-and-tested-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: e2f747b1f4 ("EDAC, sb_edac: Assign EDAC memory controller per h/w controller")
Link: http://lkml.kernel.org/r/20170913104214.7325-1-qiuxu.zhuo@intel.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
The ghes_edac driver was introduced in 2013 [1], but it has not been
enabled by any distro yet. This driver obtains error info from firmware
interfaces (APEI), which are not properly implemented on many platforms,
as the driver says on load:
This EDAC driver relies on BIOS to enumerate memory and get error
reports. Unfortunately, not all BIOSes reflect the memory layout
correctly. So, the end result of using this driver varies from vendor
to vendor. If you find incorrect reports, please contact your hardware
vendor to correct its BIOS.
To get out from this situation, add a platform check to selectively
enable the driver on platforms that are known to have proper APEI
firmware implementation.
"ghes_edac.force_load=1" skips this platform check.
[1]: https://lkml.kernel.org/r/cover.1360931635.git.mchehab@redhat.com
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170823225447.15608-4-toshi.kani@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>