Borislav Petkov
8c203dbb78
x86/RAS: Add TSC timestamp to the injected MCE
...
The MCE injection code does not provide the time stamp information for the
injected MCE. Add it.
Signed-off-by: Borislav Petkov <bp@suse.de >
Link: http://lkml.kernel.org/r/20161101120911.13163-3-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de >
2016-11-08 17:10:13 +01:00
Borislav Petkov
b199ac6c49
x86/RAS/mce_amd_inj: Remove debugfs dir recursively on exit
...
Simplify exit_mce_inject() by using debugfs_remove_recursive() and do
away with the noodling over the dentry elements.
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/20160926083152.30848-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2016-09-26 11:13:17 +02:00
Colin Ian King
8b44f00f8c
x86/RAS/mce_amd_inj: Fix signed wrap around when decrementing index 'i'
...
Change predecrement compare to post decrement compare to avoid an
unsigned integer wrap-around comparisomn when decrementing in the while
loop.
For example, if the debugfs_create_file() fails when 'i' is zero, the
current situation will predecrement 'i' in the while loop, wrapping 'i' to
the maximum signed integer and cause multiple out of bounds reads on
dfs_fls[i].d as the loop interates to zero.
Also, as Borislav Petkov suggested, return -ENODEV rather than -ENOMEM
on the error condition.
Signed-off-by: Colin Ian King <colin.king@canonical.com >
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com >
Link: http://lkml.kernel.org/r/20160926083152.30848-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2016-09-26 11:13:17 +02:00
Borislav Petkov
7cc4ef8ed1
x86/RAS/mce_amd_inj: Fix some W= warnings
...
In particular:
arch/x86/ras/mce_amd_inj.c: In function ‘prepare_msrs’:
arch/x86/ras/mce_amd_inj.c:249:13: warning: declaration of ‘i_mce’ shadows a global declaration [-Wshadow]
struct mce i_mce = *(struct mce *)info;
^~~~~
arch/x86/ras/mce_amd_inj.c: In function ‘init_mce_inject’:
arch/x86/ras/mce_amd_inj.c:453:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(dfs_fls); i++) {
Signed-off-by: Borislav Petkov <bp@suse.de >
Link: http://lkml.kernel.org/r/20160912075941.24699-16-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de >
2016-09-13 15:23:14 +02:00
Yazen Ghannam
a884675b87
x86/MCE/AMD, EDAC: Handle reserved bank 4 on Fam17h properly
...
Bank 4 is reserved on family 0x17 and shouldn't generate any MCE
records. However, broken hardware and software is not something unheard
of so warn about bank 4 errors. They shouldn't be coming from bank 4
naturally but users can still use mce_amd_inj to simulate errors from it
for testing purposed.
Also, avoid special handling in the injector mce_amd_inj like it is
being done on the older families.
[ bp: Rewrite commit message and merge into one patch. Use boot_cpu_data. ]
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com >
Signed-off-by: Borislav Petkov <bp@suse.de >
Reviewed-by: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com >
Link: http://lkml.kernel.org/r/1473384591-5323-1-git-send-email-Yazen.Ghannam@amd.com
Link: http://lkml.kernel.org/r/1473384591-5323-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de >
2016-09-13 15:23:14 +02:00
Yazen Ghannam
bad744b7f2
x86/RAS: Add syndrome support to mce_amd_inj
...
Add a debugfs file which holds the error syndrome (written into
MCA_SYND) of an injected error. Only write it on SMCA systems. Update
README file, while at it.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com >
Signed-off-by: Borislav Petkov <bp@suse.de >
Link: http://lkml.kernel.org/r/1467633035-32080-3-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de >
2016-09-13 15:23:07 +02:00
Yazen Ghannam
340e983ab8
x86/RAS/AMD: Reduce the number of IPIs when prepping error injection
...
We currently use wrmsr_on_cpu() 4 times when prepping for an error
injection. This will generate 4 IPIs for each MSR write. We can reduce
the number of IPIs to 1 by grouping the MSR writes and executing them
serially on the appropriate CPU.
Suggested-by: Borislav Petkov <bp@suse.de >
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com >
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Tony Luck <tony.luck@intel.com >
Cc: linux-edac <linux-edac@vger.kernel.org >
Link: http://lkml.kernel.org/r/1467968983-4874-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2016-07-08 11:29:26 +02:00
Yazen Ghannam
754a923059
x86/RAS: Add SMCA support to AMD Error Injector
...
Use SMCA MSRs when writing to MCA_{STATUS,ADDR,MISC} and
MCA_DE{STAT,ADDR} when injecting Deferred Errors on SMCA platforms.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com >
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Tony Luck <tony.luck@intel.com >
Cc: linux-edac <linux-edac@vger.kernel.org >
Link: http://lkml.kernel.org/r/1462971509-3856-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2016-05-12 09:08:23 +02:00
Borislav Petkov
69385f8879
x86/RAS: Rename AMD MCE injector config item
...
... to be the same like the file name of injection module itself to
avoid confusion when grepping.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Tony Luck <tony.luck@intel.com >
Link: http://lkml.kernel.org/r/1459929916-12852-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2016-04-13 10:54:23 +02:00
Peter Zijlstra
ee6825c80e
x86/topology: Fix AMD core count
...
It turns out AMD gets x86_max_cores wrong when there are compute
units.
The issue is that Linux assumes:
nr_logical_cpus = nr_cores * nr_siblings
But AMD reports its CU unit as 2 cores, but then sets num_smp_siblings
to 2 as well.
Boris: fixup ras/mce_amd_inj.c too, to compute the Node Base Core
properly, according to the new nomenclature.
Fixes: 1f12e32f4c
("x86/topology: Create logical package id")
Reported-by: Xiong Zhou <jencce.kernel@gmail.com >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Andreas Herrmann <aherrmann@suse.com >
Cc: Andy Lutomirski <luto@kernel.org >
Link: http://lkml.kernel.org/r/20160317095220.GO6344@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de >
2016-03-29 10:45:04 +02:00
Aravind Gopalakrishnan
fa20a2ed6f
x86/ras/mce_amd_inj: Inject bank 4 errors on the NBC
...
Bank 4 MCEs are logged and reported only on the node base core
(NBC) in a socket. Refer to the D18F3x44[NbMcaToMstCpuEn] field
in Fam10h and later BKDGs. The node base core (NBC) is the
lowest numbered core in the node.
This patch ensures that we inject the error on the NBC for bank
4 errors. Otherwise, triggering #MC or APIC interrupts on a core
which is not the NBC would not have any effect on the system,
i.e. we would not see any relevant output on kernel logs for the
error we just injected.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com >
[ Cleanup comments. ]
[ Add a missing dependency on AMD_NB caught by Randy Dunlap. ]
Signed-off-by: Borislav Petkov <bp@suse.de >
Acked-by: Randy Dunlap <rdunlap@infradead.org >
Cc: Borislav Petkov <bp@alien8.de >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Tony Luck <tony.luck@intel.com >
Link: http://lkml.kernel.org/r/1443190851-2172-4-git-send-email-Aravind.Gopalakrishnan@amd.com
Link: http://lkml.kernel.org/r/1444641762-9437-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-10-12 16:15:48 +02:00
Aravind Gopalakrishnan
a1300e5052
x86/ras/mce_amd_inj: Trigger deferred and thresholding errors interrupts
...
Add the capability to trigger deferred error interrupts and
threshold interrupts in order to test the APIC interrupt handler
functionality for these type of errors.
Update README section about the same too.
Reported by: kbuild test robot <fengguang.wu@intel.com >
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com >
[ Cleanup comments. ]
[ Include asm/irq_vectors.h directly so that misc randbuilds don't fail. ]
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Borislav Petkov <bp@alien8.de >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Tony Luck <tony.luck@intel.com >
Link: http://lkml.kernel.org/r/1443190851-2172-3-git-send-email-Aravind.Gopalakrishnan@amd.com
Link: http://lkml.kernel.org/r/1444641762-9437-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-10-12 16:15:47 +02:00
Aravind Gopalakrishnan
85c9306d44
x86/ras/mce_amd_inj: Return early on invalid input
...
Invalid inputs such as these are currently reported in dmesg as
failing:
$> echo sweet > flags
[ 122.079139] flags_write: Invalid flags value: et
even though the 'flags' attribute has been updated correctly:
$> cat flags
sw
This is because userspace keeps writing the remaining buffer
until it encounters an error.
However, the input as a whole is wrong and we should not be
writing anything to the file. Therefore, correct flags_write()
to return -EINVAL immediately on bad input strings.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com >
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Borislav Petkov <bp@alien8.de >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Tony Luck <tony.luck@intel.com >
Link: http://lkml.kernel.org/r/1443190851-2172-2-git-send-email-Aravind.Gopalakrishnan@amd.com
Link: http://lkml.kernel.org/r/1444641762-9437-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-10-12 16:15:47 +02:00
Borislav Petkov
6c36dfe949
x86/ras: Move AMD MCE injector to arch/x86/ras/
...
This is an x86-specific module and would benefit from being
closer to the arch code. Move it there. Update copyright while
at it.
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Tony Luck <tony.luck@intel.com >
Link: http://lkml.kernel.org/r/1439396985-12812-14-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-08-13 10:12:54 +02:00