| Commit message (Collapse) | Author | Age |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A small set of fixes for x86:
- Prevent X2APIC ID 0xFFFFFFFF from being treated as valid, which
causes the possible CPU count to be wrong.
- Prevent 32bit truncation in calc_hpet_ref() which causes the TSC
calibration to fail
- Fix the page table setup for temporary text mappings in the resume
code which causes resume failures
- Make the page table dump code handle HIGHPTE correctly instead of
oopsing
- Support for topologies where NUMA nodes share an LLC to prevent a
invalid topology warning and further malfunction on such systems.
- Remove the now unused pci-nommu code
- Remove stale function declarations"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/power/64: Fix page-table setup for temporary text mapping
x86/mm: Prevent kernel Oops in PTDUMP code with HIGHPTE=y
x86,sched: Allow topologies where NUMA nodes share an LLC
x86/processor: Remove two unused function declarations
x86/acpi: Prevent X2APIC id 0xffffffff from being accounted
x86/tsc: Prevent 32bit truncation in calc_hpet_ref()
x86: Remove pci-nommu.c
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
early_trap_init() and cpu_set_gdt() have been removed, so remove the stale
declarations as well.
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Cc: luto@kernel.org
Cc: hpa@zytor.com
Cc: bp@suse.de
Cc: kirill.shutemov@linux.intel.com
Link: https://lkml.kernel.org/r/20180404064527.10562-1-douly.fnst@cn.fujitsu.com
|
|\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"A larger set of updates for perf.
Kernel:
- Handle the SBOX uncore monitoring correctly on Broadwell CPUs which
do not have SBOX.
- Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE]. The
percentage of preempting and non-preempting context switches help
understanding the nature of workloads (CPU or IO bound) that are
running on a machine. This adds the kernel facility and userspace
changes needed to show this information in 'perf script' and 'perf
report -D' (Alexey Budankov)
- Remove a WARN_ON() in the trace/kprobes code which is pointless
because the return error code is already telling the caller what's
wrong.
- Revert a fugly workaround for clang BPF targets.
- Fix sample_max_stack maximum check and do not proceed when an error
has been detect, return them to avoid misidentifying errors (Jiri
Olsa)
- Add SPDX idenitifiers and get rid of GPL boilderplate.
Tools:
- Synchronize kernel ABI headers, v4.17-rc1 (Ingo Molnar)
- Support MAP_FIXED_NOREPLACE, noticed when updating the
tools/include/ copies (Arnaldo Carvalho de Melo)
- Add '\n' at the end of parse-options error messages (Ravi Bangoria)
- Add s390 support for detailed/verbose PMU event description (Thomas
Richter)
- perf annotate fixes and improvements:
* Allow showing offsets in more than just jump targets, use the
new 'O' hotkey in the TUI, config ~/.perfconfig
annotate.offset_level for it and for --stdio2 (Arnaldo Carvalho
de Melo)
* Use the resolved variable names from objdump disassembled lines
to make them more compact, just like was already done for some
instructions, like "mov", this eventually will be done more
generally, but lets now add some more to the existing mechanism
(Arnaldo Carvalho de Melo)
- perf record fixes:
* Change warning for missing topology sysfs entry to debug, as not
all architectures have those files, s390 being one of those
(Thomas Richter)
* Remove old error messages about things that unlikely to be the
root cause in modern systems (Andi Kleen)
- perf sched fixes:
* Fix -g/--call-graph documentation (Takuya Yamamoto)
- perf stat:
* Enable 1ms interval for printing event counters values in
(Alexey Budankov)
- perf test fixes:
* Run dwarf unwind on arm32 (Kim Phillips)
* Remove unused ptrace.h include from LLVM test, sidesteping older
clang's lack of support for some asm constructs (Arnaldo
Carvalho de Melo)
* Fixup BPF test using epoll_pwait syscall function probe, to cope
with the syscall routines renames performed in this development
cycle (Arnaldo Carvalho de Melo)
- perf version fixes:
* Do not print info about HAVE_LIBAUDIT_SUPPORT in 'perf version
--build-options' when HAVE_SYSCALL_TABLE_SUPPORT is true, as
libaudit won't be used in that case, print info about
syscall_table support instead (Jin Yao)
- Build system fixes:
* Use HAVE_..._SUPPORT used consistently (Jin Yao)
* Restore READ_ONCE() C++ compatibility in tools/include (Mark
Rutland)
* Give hints about package names needed to build jvmti (Arnaldo
Carvalho de Melo)"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs
perf/x86/intel/uncore: Revert "Remove SBOX support for Broadwell server"
coresight: Move to SPDX identifier
perf test BPF: Fixup BPF test using epoll_pwait syscall function probe
perf tests mmap: Show which tracepoint is failing
perf tools: Add '\n' at the end of parse-options error messages
perf record: Remove suggestion to enable APIC
perf record: Remove misleading error suggestion
perf hists browser: Clarify top/report browser help
perf mem: Allow all record/report options
perf trace: Support MAP_FIXED_NOREPLACE
perf: Remove superfluous allocation error check
perf: Fix sample_max_stack maximum check
perf: Return proper values for user stack errors
perf list: Add s390 support for detailed/verbose PMU event description
perf script: Extend misc field decoding with switch out event type
perf report: Extend raw dump (-D) out with switch out event type
perf/core: Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE]
tools/headers: Synchronize kernel ABI headers, v4.17-rc1
trace_kprobe: Remove warning message "Could not insert probe at..."
...
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull tooling improvements and fixes from Arnaldo Carvalho de Melo:
perf annotate fixes and improvements:
- Allow showing offsets in more than just jump targets, use the new
'O' hotkey in the TUI, config ~/.perfconfig annotate.offset_level
for it and for --stdio2 (Arnaldo Carvalho de Melo)
- Use the resolved variable names from objdump disassembled lines to
make them more compact, just like was already done for some instructions,
like "mov", this eventually will be done more generally, but lets now add
some more to the existing mechanism (Arnaldo Carvalho de Melo)
perf record fixes:
- Change warning for missing topology sysfs entry to debug, as not all
architectures have those files, s390 being one of those (Thomas Richter)
perf sched fixes:
- Fix -g/--call-graph documentation (Takuya Yamamoto)
perf stat:
- Enable 1ms interval for printing event counters values in (Alexey Budankov)
perf test fixes:
- Run dwarf unwind on arm32 (Kim Phillips)
- Remove unused ptrace.h include from LLVM test, sidesteping older
clang's lack of support for some asm constructs (Arnaldo Carvalho de Melo)
perf version fixes:
- Do not print info about HAVE_LIBAUDIT_SUPPORT in 'perf version --build-options'
when HAVE_SYSCALL_TABLE_SUPPORT is true, as libaudit won't be used in that
case, print info about syscall_table support instead (Jin Yao)
Build system fixes:
- Use HAVE_..._SUPPORT used consistently (Jin Yao)
- Restore READ_ONCE() C++ compatibility in tools/include (Mark Rutland)
- Give hints about package names needed to build jvmti (Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
target"
This reverts commit ca26cffa4e4aaeb09bb9e308f95c7835cb149248.
Newer clang versions accept that asm(_ASM_SP) construct, and now that
the bpf-script-test-kbuild.c script, used in one of the 'perf test LLVM'
subtests doesn't include ptrace.h, which ended up including
arch/x86/include/asm/asm.h, we can revert this patch.
Suggested-by: Yonghong Song <yhs@fb.com>
Link: https://lkml.kernel.org/r/613f0a0d-c433-8f4d-dcc1-c9889deae39e@fb.com
Acked-by: Yonghong Song <yhs@fb.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-nqozcv8loq40tkqpfw997993@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|\ \ \
| |/ /
|/| |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull kvm fixes from Paolo Bonzini:
"Bug fixes, plus a new test case and the associated infrastructure for
writing nested virtualization tests"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: selftests: add vmx_tsc_adjust_test
kvm: x86: move MSR_IA32_TSC handling to x86.c
X86/KVM: Properly update 'tsc_offset' to represent the running guest
kvm: selftests: add -std=gnu99 cflags
x86: Add check for APIC access address for vmentry of L2 guests
KVM: X86: fix incorrect reference of trace_kvm_pi_irte_update
X86/KVM: Do not allow DISABLE_EXITS_MWAIT when LAPIC ARAT is not available
kvm: selftests: fix spelling mistake: "divisable" and "divisible"
X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Update 'tsc_offset' on vmentry/vmexit of L2 guests to ensure that it always
captures the TSC_OFFSET of the running guest whether it is the L1 or L2
guest.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Jim Mattson <jmattson@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
[AMD changes, fix update_ia32_tsc_adjust_msr. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes and updates for x86:
- Address a swiotlb regression which was caused by the recent DMA
rework and made driver fail because dma_direct_supported() returned
false
- Fix a signedness bug in the APIC ID validation which caused invalid
APIC IDs to be detected as valid thereby bloating the CPU possible
space.
- Fix inconsisten config dependcy/select magic for the MFD_CS5535
driver.
- Fix a corruption of the physical address space bits when encryption
has reduced the address space and late cpuinfo updates overwrite
the reduced bit information with the original value.
- Dominiks syscall rework which consolidates the architecture
specific syscall functions so all syscalls can be wrapped with the
same macros. This allows to switch x86/64 to struct pt_regs based
syscalls. Extend the clearing of user space controlled registers in
the entry patch to the lower registers"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Fix signedness bug in APIC ID validity checks
x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption
x86/olpc: Fix inconsistent MFD_CS5535 configuration
swiotlb: Use dma_direct_supported() for swiotlb_ops
syscalls/x86: Adapt syscall_wrapper.h to the new syscall stub naming convention
syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()
syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention
syscalls/core, syscalls/x86: Clean up syscall stub naming convention
syscalls/x86: Extend register clearing on syscall entry to lower registers
syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64
syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32
syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls
syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls
syscalls/core: Introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
x86/syscalls: Don't pointlessly reload the system call number
x86/mm: Fix documentation of module mapping range with 4-level paging
x86/cpuid: Switch to 'static const' specifier
|
| |\ \ \
| | | | |
| | | | |
| | | | | |
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Make the code in syscall_wrapper.h more readable by naming the stub macros
similar to the stub they provide. While at it, fix a stray newline at the
end of the __IA32_COMPAT_SYS_STUBx macro.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180409105145.5364-5-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
__x64_sys_*()
This rename allows us to have a coherent syscall stub naming convention on
64-bit x86 (0xffffffff prefix removed):
810f0af0 t kernel_waitid # common (32/64) kernel helper
<inline> __do_sys_waitid # inlined helper doing actual work
810f0be0 t __se_sys_waitid # C func calling inlined helper
<inline> __do_compat_sys_waitid # inlined helper doing actual work
810f0d80 t __se_compat_sys_waitid # compat C func calling inlined helper
810f2080 T __x64_sys_waitid # x64 64-bit-ptregs -> C stub
810f20b0 T __ia32_sys_waitid # ia32 32-bit-ptregs -> C stub[*]
810f2470 T __ia32_compat_sys_waitid # ia32 32-bit-ptregs -> compat C stub
810f2490 T __x32_compat_sys_waitid # x32 64-bit-ptregs -> compat C stub
[*] This stub is unused, as the syscall table links
__ia32_compat_sys_waitid instead of __ia32_sys_waitid as we need
a compat variant here.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180409105145.5364-4-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Tidy the naming convention for compat syscall subs. Hints which describe
the purpose of the stub go in front and receive a double underscore to
denote that they are generated on-the-fly by the COMPAT_SYSCALL_DEFINEx()
macro.
For the generic case, this means:
t kernel_waitid # common C function (see kernel/exit.c)
__do_compat_sys_waitid # inlined helper doing the actual work
# (takes original parameters as declared)
T __se_compat_sys_waitid # sign-extending C function calling inlined
# helper (takes parameters of type long,
# casts them to unsigned long and then to
# the declared type)
T compat_sys_waitid # alias to __se_compat_sys_waitid()
# (taking parameters as declared), to
# be included in syscall table
For x86, the naming is as follows:
t kernel_waitid # common C function (see kernel/exit.c)
__do_compat_sys_waitid # inlined helper doing the actual work
# (takes original parameters as declared)
t __se_compat_sys_waitid # sign-extending C function calling inlined
# helper (takes parameters of type long,
# casts them to unsigned long and then to
# the declared type)
T __ia32_compat_sys_waitid # IA32_EMULATION 32-bit-ptregs -> C stub,
# calls __se_compat_sys_waitid(); to be
# included in syscall table
T __x32_compat_sys_waitid # x32 64-bit-ptregs -> C stub, calls
# __se_compat_sys_waitid(); to be included
# in syscall table
If only one of IA32_EMULATION and x32 is enabled, __se_compat_sys_waitid()
may be inlined into the stub __{ia32,x32}_compat_sys_waitid().
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180409105145.5364-3-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Tidy the naming convention for compat syscall subs. Hints which describe
the purpose of the stub go in front and receive a double underscore to
denote that they are generated on-the-fly by the SYSCALL_DEFINEx() macro.
For the generic case, this means (0xffffffff prefix removed):
810f08d0 t kernel_waitid # common C function (see kernel/exit.c)
<inline> __do_sys_waitid # inlined helper doing the actual work
# (takes original parameters as declared)
810f1aa0 T __se_sys_waitid # sign-extending C function calling inlined
# helper (takes parameters of type long;
# casts them to the declared type)
810f1aa0 T sys_waitid # alias to __se_sys_waitid() (taking
# parameters as declared), to be included
# in syscall table
For x86, the naming is as follows:
810efc70 t kernel_waitid # common C function (see kernel/exit.c)
<inline> __do_sys_waitid # inlined helper doing the actual work
# (takes original parameters as declared)
810efd60 t __se_sys_waitid # sign-extending C function calling inlined
# helper (takes parameters of type long;
# casts them to the declared type)
810f1140 T __ia32_sys_waitid # IA32_EMULATION 32-bit-ptregs -> C stub,
# calls __se_sys_waitid(); to be included
# in syscall table
810f1110 T sys_waitid # x86 64-bit-ptregs -> C stub, calls
# __se_sys_waitid(); to be included in
# syscall table
For x86, sys_waitid() will be re-named to __x64_sys_waitid in a follow-up
patch.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180409105145.5364-2-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Removing CONFIG_SYSCALL_PTREGS from arch/x86/Kconfig and simply selecting
ARCH_HAS_SYSCALL_WRAPPER unconditionally on x86-64 allows us to simplify
several codepaths.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180405095307.3730-7-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
and x32
Extend ARCH_HAS_SYSCALL_WRAPPER for i386 emulation and for x32 on 64-bit
x86.
For x32, all we need to do is to create an additional stub for each
compat syscall which decodes the parameters in x86-64 ordering, e.g.:
asmlinkage long __compat_sys_x32_xyzzy(struct pt_regs *regs)
{
return c_SyS_xyzzy(regs->di, regs->si, regs->dx);
}
For i386 emulation, we need to teach compat_sys_*() to take struct
pt_regs as its only argument, e.g.:
asmlinkage long __compat_sys_ia32_xyzzy(struct pt_regs *regs)
{
return c_SyS_xyzzy(regs->bx, regs->cx, regs->dx);
}
In addition, we need to create additional stubs for common syscalls
(that is, for syscalls which have the same parameters on 32-bit and
64-bit), e.g.:
asmlinkage long __sys_ia32_xyzzy(struct pt_regs *regs)
{
return c_sys_xyzzy(regs->bx, regs->cx, regs->dx);
}
This approach avoids leaking random user-provided register content down
the call chain.
This patch is based on an original proof-of-concept
| From: Linus Torvalds <torvalds@linux-foundation.org>
| Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
and was split up and heavily modified by me, in particular to base it on
ARCH_HAS_SYSCALL_WRAPPER.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180405095307.3730-6-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | |/
| | |/|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
64-bit syscalls
Let's make use of ARCH_HAS_SYSCALL_WRAPPER=y on pure 64-bit x86-64 systems:
Each syscall defines a stub which takes struct pt_regs as its only
argument. It decodes just those parameters it needs, e.g:
asmlinkage long sys_xyzzy(const struct pt_regs *regs)
{
return SyS_xyzzy(regs->di, regs->si, regs->dx);
}
This approach avoids leaking random user-provided register content down
the call chain.
For example, for sys_recv() which is a 4-parameter syscall, the assembly
now is (in slightly reordered fashion):
<sys_recv>:
callq <__fentry__>
/* decode regs->di, ->si, ->dx and ->r10 */
mov 0x70(%rdi),%rdi
mov 0x68(%rdi),%rsi
mov 0x60(%rdi),%rdx
mov 0x38(%rdi),%rcx
[ SyS_recv() is automatically inlined by the compiler,
as it is not [yet] used anywhere else ]
/* clear %r9 and %r8, the 5th and 6th args */
xor %r9d,%r9d
xor %r8d,%r8d
/* do the actual work */
callq __sys_recvfrom
/* cleanup and return */
cltq
retq
The only valid place in an x86-64 kernel which rightfully calls
a syscall function on its own -- vsyscall -- needs to be modified
to pass struct pt_regs onwards as well.
To keep the syscall table generation working independent of
SYSCALL_PTREGS being enabled, the stubs are named the same as the
"original" syscall stubs, i.e. sys_*().
This patch is based on an original proof-of-concept
| From: Linus Torvalds <torvalds@linux-foundation.org>
| Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
and was split up and heavily modified by me, in particular to base it on
ARCH_HAS_SYSCALL_WRAPPER, to limit it to 64-bit-only for the time being,
and to update the vsyscall to the new calling convention.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180405095307.3730-4-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The APIC ID as parsed from ACPI MADT is validity checked with the
apic->apic_id_valid() callback, which depends on the selected APIC type.
For non X2APIC types APIC IDs >= 0xFF are invalid, but values > 0x7FFFFFFF
are detected as valid. This happens because the 'apicid' argument of the
apic_id_valid() callback is type 'int'. So the resulting comparison
apicid < 0xFF
evaluates to true for all unsigned int values > 0x7FFFFFFF which are handed
to default_apic_id_valid(). As a consequence, invalid APIC IDs in !X2APIC
mode are considered valid and accounted as possible CPUs.
Change the apicid argument type of the apic_id_valid() callback to u32 so
the evaluation is unsigned and returns the correct result.
[ tglx: Massaged changelog ]
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: jgross@suse.com
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1523322966-10296-1-git-send-email-lirongqing@baidu.com
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti updates from Thomas Gleixner:
"Another series of PTI related changes:
- Remove the manual stack switch for user entries from the idtentry
code. This debloats entry by 5k+ bytes of text.
- Use the proper types for the asm/bootparam.h defines to prevent
user space compile errors.
- Use PAGE_GLOBAL for !PCID systems to gain back performance
- Prevent setting of huge PUD/PMD entries when the entries are not
leaf entries otherwise the entries to which the PUD/PMD points to
and are populated get lost"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
x86/pti: Leave kernel text global for !PCID
x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image
x86/pti: Enable global pages for shared areas
x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init
x86/mm: Comment _PAGE_GLOBAL mystery
x86/mm: Remove extra filtering in pageattr code
x86/mm: Do not auto-massage page protections
x86/espfix: Document use of _PAGE_GLOBAL
x86/mm: Introduce "default" kernel PTE mask
x86/mm: Undo double _PAGE_PSE clearing
x86/mm: Factor out pageattr _PAGE_GLOBAL setting
x86/entry/64: Drop idtentry's manual stack switch for user entries
x86/uapi: Fix asm/bootparam.h userspace compilation errors
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Global pages are bad for hardening because they potentially let an
exploit read the kernel image via a Meltdown-style attack which
makes it easier to find gadgets.
But, global pages are good for performance because they reduce TLB
misses when making user/kernel transitions, especially when PCIDs
are not available, such as on older hardware, or where a hypervisor
has disabled them for some reason.
This patch implements a basic, sane policy: If you have PCIDs, you
only map a minimal amount of kernel text global. If you do not have
PCIDs, you map all kernel text global.
This policy effectively makes PCIDs something that not only adds
performance but a little bit of hardening as well.
I ran a simple "lseek" microbenchmark[1] to test the benefit on
a modern Atom microserver. Most of the benefit comes from applying
the series before this patch ("entry only"), but there is still a
signifiant benefit from this patch.
No Global Lines (baseline ): 6077741 lseeks/sec
88 Global Lines (entry only): 7528609 lseeks/sec (+23.9%)
94 Global Lines (this patch): 8433111 lseeks/sec (+38.8%)
[1.] https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205518.E3D989EB@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
A PTE is constructed from a physical address and a pgprotval_t.
__PAGE_KERNEL, for instance, is a pgprot_t and must be converted
into a pgprotval_t before it can be used to create a PTE. This is
done implicitly within functions like pfn_pte() by massage_pgprot().
However, this makes it very challenging to set bits (and keep them
set) if your bit is being filtered out by massage_pgprot().
This moves the bit filtering out of pfn_pte() and friends. For
users of PAGE_KERNEL*, filtering will be done automatically inside
those macros but for users of __PAGE_KERNEL*, they need to do their
own filtering now.
Note that we also just move pfn_pte/pmd/pud() over to check_pgprot()
instead of massage_pgprot(). This way, we still *look* for
unsupported bits and properly warn about them if we find them. This
might happen if an unfiltered __PAGE_KERNEL* value was passed in,
for instance.
- printk format warning fix from: Arnd Bergmann <arnd@arndb.de>
- boot crash fix from: Tom Lendacky <thomas.lendacky@amd.com>
- crash bisected by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reported-and-fixed-by: Arnd Bergmann <arnd@arndb.de>
Fixed-by: Tom Lendacky <thomas.lendacky@amd.com>
Bisected-by: Mike Galbraith <efault@gmx.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205509.77E1D7F6@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The __PAGE_KERNEL_* page permissions are "raw". They contain bits
that may or may not be supported on the current processor. They need
to be filtered by a mask (currently __supported_pte_mask) to turn them
into a value that we can actually set in a PTE.
These __PAGE_KERNEL_* values all contain _PAGE_GLOBAL. But, with PTI,
we want to be able to support _PAGE_GLOBAL (have the bit set in
__supported_pte_mask) but not have it appear in any of these masks by
default.
This patch creates a new mask, __default_kernel_pte_mask, and applies
it when creating all of the PAGE_KERNEL_* masks. This makes
PAGE_KERNEL_* safe to use anywhere (they only contain supported bits).
It also ensures that PAGE_KERNEL_* contains _PAGE_GLOBAL on PTI=n
kernels but clears _PAGE_GLOBAL when PTI=y.
We also make __default_kernel_pte_mask a non-GPL exported symbol
because there are plenty of driver-available interfaces that take
PAGE_KERNEL_* permissions.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205506.030DB6B6@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI bootup fixlet from Thomas Gleixner:
"A single fix for an early boot warning caused by invoking
this_cpu_has() before SMP initialization"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix bogus warning during EFI bootup, use boot_cpu_has() instead of this_cpu_has() in build_cr3_noflush()
|
| | |_|/ /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
this_cpu_has() in build_cr3_noflush()
Linus reported the following boot warning:
WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/tlbflush.h:134 load_new_mm_cr3+0x114/0x170
[...]
Call Trace:
switch_mm_irqs_off+0x267/0x590
switch_mm+0xe/0x20
efi_switch_mm+0x3e/0x50
efi_enter_virtual_mode+0x43f/0x4da
start_kernel+0x3bf/0x458
secondary_startup_64+0xa5/0xb0
... after merging:
03781e40890c: x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3
When the platform supports PCID and if CONFIG_DEBUG_VM=y is enabled,
build_cr3_noflush() (called via switch_mm()) does a sanity check to see
if X86_FEATURE_PCID is set.
Presently, build_cr3_noflush() uses "this_cpu_has(X86_FEATURE_PCID)" to
perform the check but this_cpu_has() works only after SMP is initialized
(i.e. per cpu cpu_info's should be populated) and this happens to be very
late in the boot process (during rest_init()).
As efi_runtime_services() are called during (early) kernel boot time
and run time, modify build_cr3_noflush() to use boot_cpu_has() all the
time. As suggested by Dave Hansen, this should be OK because all CPU's have
same capabilities on x86.
With this change the warning is fixed.
( Dave also suggested that we put a warning in this_cpu_has() if it's used
early in the boot process. This is still work in progress as it affects
MCE. )
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Lee Chun-Yi <jlee@suse.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1522870459-7432-1-git-send-email-sai.praneeth.prakhya@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
As arch_kexec_kernel_image_{probe,load}(),
arch_kimage_file_post_load_cleanup() and arch_kexec_kernel_verify_sig()
are almost duplicated among architectures, they can be commonalized with
an architecture-defined kexec_file_ops array. So let's factor them out.
Link: http://lkml.kernel.org/r/20180306102303.9063-3-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Juergen Gross noticed that commit f7f99100d8d ("mm: stop zeroing memory
during allocation in vmemmap") broke XEN PV domains when deferred struct
page initialization is enabled.
This is because the xen's PagePinned() flag is getting erased from
struct pages when they are initialized later in boot.
Juergen fixed this problem by disabling deferred pages on xen pv
domains. It is desirable, however, to have this feature available as it
reduces boot time. This fix re-enables the feature for pv-dmains, and
fixes the problem the following way:
The fix is to delay setting PagePinned flag until struct pages for all
allocated memory are initialized, i.e. until after free_all_bootmem().
A new x86_init.hyper op init_after_bootmem() is called to let xen know
that boot allocator is done, and hence struct pages for all the
allocated memory are now initialized. If deferred page initialization
is enabled, the rest of struct pages are going to be initialized later
in boot once page_alloc_init_late() is called.
xen_after_bootmem() walks page table's pages and marks them pinned.
Link: http://lkml.kernel.org/r/20180226160112.24724-2-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Jinbum Park <jinb.park7@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jia Zhang <zhang.jia@linux.alibaba.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \ \ \
| | |_|_|/
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull kvm updates from Paolo Bonzini:
"ARM:
- VHE optimizations
- EL2 address space randomization
- speculative execution mitigations ("variant 3a", aka execution past
invalid privilege register access)
- bugfixes and cleanups
PPC:
- improvements for the radix page fault handler for HV KVM on POWER9
s390:
- more kvm stat counters
- virtio gpu plumbing
- documentation
- facilities improvements
x86:
- support for VMware magic I/O port and pseudo-PMCs
- AMD pause loop exiting
- support for AMD core performance extensions
- support for synchronous register access
- expose nVMX capabilities to userspace
- support for Hyper-V signaling via eventfd
- use Enlightened VMCS when running on Hyper-V
- allow userspace to disable MWAIT/HLT/PAUSE vmexits
- usual roundup of optimizations and nested virtualization bugfixes
Generic:
- API selftest infrastructure (though the only tests are for x86 as
of now)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits)
kvm: x86: fix a prototype warning
kvm: selftests: add sync_regs_test
kvm: selftests: add API testing infrastructure
kvm: x86: fix a compile warning
KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"
KVM: X86: Introduce handle_ud()
KVM: vmx: unify adjacent #ifdefs
x86: kvm: hide the unused 'cpu' variable
KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig
Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown"
kvm: Add emulation for movups/movupd
KVM: VMX: raise internal error for exception during invalid protected mode state
KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending
KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
KVM: x86: Fix misleading comments on handling pending exceptions
KVM: x86: Rename interrupt.pending to interrupt.injected
KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt
x86/kvm: use Enlightened VMCS when running on Hyper-V
x86/hyper-v: detect nested features
x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
...
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
For exceptions & NMIs events, KVM code use the following
coding convention:
*) "pending" represents an event that should be injected to guest at
some point but it's side-effects have not yet occurred.
*) "injected" represents an event that it's side-effects have already
occurred.
However, interrupts don't conform to this coding convention.
All current code flows mark interrupt.pending when it's side-effects
have already taken place (For example, bit moved from LAPIC IRR to
ISR). Therefore, it makes sense to just rename
interrupt.pending to interrupt.injected.
This change follows logic of previous commit 664f8e26b00c ("KVM: X86:
Fix loss of exception which has not yet been injected") which changed
exception to follow this coding convention as well.
It is important to note that in case !lapic_in_kernel(vcpu),
interrupt.pending usage was and still incorrect.
In this case, interrrupt.pending can only be set using one of the
following ioctls: KVM_INTERRUPT, KVM_SET_VCPU_EVENTS and
KVM_SET_SREGS. Looking at how QEMU uses these ioctls, one can see that
QEMU uses them either to re-set an "interrupt.pending" state it has
received from KVM (via KVM_GET_VCPU_EVENTS interrupt.pending or
via KVM_GET_SREGS interrupt_bitmap) or by dispatching a new interrupt
from QEMU's emulated LAPIC which reset bit in IRR and set bit in ISR
before sending ioctl to KVM. So it seems that indeed "interrupt.pending"
in this case is also suppose to represent "interrupt.injected".
However, kvm_cpu_has_interrupt() & kvm_cpu_has_injectable_intr()
is misusing (now named) interrupt.injected in order to return if
there is a pending interrupt.
This leads to nVMX/nSVM not be able to distinguish if it should exit
from L2 to L1 on EXTERNAL_INTERRUPT on pending interrupt or should
re-inject an injected interrupt.
Therefore, add a FIXME at these functions for handling this issue.
This patch introduce no semantics change.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
TLFS 5.0 says: "Support for an enlightened VMCS interface is reported with
CPUID leaf 0x40000004. If an enlightened VMCS interface is supported,
additional nested enlightenments may be discovered by reading the CPUID
leaf 0x4000000A (see 2.4.11)."
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The definitions are according to the Hyper-V TLFS v5.0. KVM on Hyper-V will
use these.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Virtual Processor Assist Pages usage allows us to do optimized EOI
processing for APIC, enable Enlightened VMCS support in KVM and more.
struct hv_vp_assist_page is defined according to the Hyper-V TLFS v5.0b.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The assist page has been used only for the paravirtual EOI so far, hence
the "APIC" in the MSR name. Renaming to match the Hyper-V TLFS where it's
called "Virtual VP Assist MSR".
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
mshyperv.h now only contains fucntions/variables we define in kernel, all
definitions from TLFS should go to hyperv-tlfs.h.
'enum hv_cpuid_function' is removed as we already have this info in
hyperv-tlfs.h, code in mshyperv.c is adjusted accordingly.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
hyperv.h is not part of uapi, there are no (known) users outside of kernel.
We are making changes to this file to match current Hyper-V Hypervisor
Top-Level Functional Specification (TLFS, see:
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs)
and we don't want to maintain backwards compatibility.
Move the file renaming to hyperv-tlfs.h to avoid confusing it with
mshyperv.h. In future, all definitions from TLFS should go to it and
all kernel objects should go to mshyperv.h or include/linux/hyperv.h.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This patch adds the support for pause filtering threshold. This feature
support is indicated by CPUID Fn8000_000A_EDX. See AMD APM Vol 2 Section
15.14.4 Pause Intercept Filtering for more details.
In this mode, a 16-bit pause filter threshold field is added in VMCB.
The threshold value is a cycle count that is used to reset the pause
counter. As with simple pause filtering, VMRUN loads the pause count
value from VMCB into an internal counter. Then, on each pause instruction
the hardware checks the elapsed number of cycles since the most recent
pause instruction against the pause Filter Threshold. If the elapsed cycle
count is greater than the pause filter threshold, then the internal pause
count is reloaded from VMCB and execution continues. If the elapsed cycle
count is less than the pause filter threshold, then the internal pause
count is decremented. If the count value is less than zero and pause
intercept is enabled, a #VMEXIT is triggered. If advanced pause filtering
is supported and pause filter threshold field is set to zero, the filter
will operate in the simpler, count only mode.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add struct kvm_svm, which is analagous to struct vcpu_svm, along with
a helper to_kvm_svm() to retrieve kvm_svm from a struct kvm *. Move
the SVM specific variables and struct definitions out of kvm_arch
and into kvm_svm.
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add struct kvm_vmx, which wraps struct kvm, and a helper to_kvm_vmx()
that retrieves 'struct kvm_vmx *' from 'struct kvm *'. Move the VMX
specific variables out of kvm_arch and into kvm_vmx.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add kvm_x86_ops->set_identity_map_addr and set ept_identity_map_addr
in VMX specific code so that ept_identity_map_addr can be moved out
of 'struct kvm_arch' in a future patch.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Define kvm_arch_[alloc|free]_vm in x86 as pass through functions
to new kvm_x86_ops vm_alloc and vm_free, and move the current
allocation logic as-is to SVM and VMX. Vendor specific alloc/free
functions set the stage for SVM/VMX wrappers of 'struct kvm',
which will allow us to move the growing number of SVM/VMX specific
member variables out of 'struct kvm_arch'.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When L1 IOAPIC redirection-table is written, a request of
KVM_REQ_SCAN_IOAPIC is set on all vCPUs. This is done such that
all vCPUs will now recalc their IOAPIC handled vectors and load
it to their EOI-exitmap.
However, it could be that one of the vCPUs is currently running
L2. In this case, load_eoi_exitmap() will be called which would
write to vmcs02->eoi_exit_bitmap, which is wrong because
vmcs02->eoi_exit_bitmap should always be equal to
vmcs12->eoi_exit_bitmap. Furthermore, at this point
KVM_REQ_SCAN_IOAPIC was already consumed and therefore we will
never update vmcs01->eoi_exit_bitmap. This could lead to remote_irr
of some IOAPIC level-triggered entry to remain set forever.
Fix this issue by delaying the load of EOI-exitmap to when vCPU
is running L1.
One may wonder why not just delay entire KVM_REQ_SCAN_IOAPIC
processing to when vCPU is running L1. This is done in order to handle
correctly the case where LAPIC & IO-APIC of L1 is pass-throughed into
L2. In this case, vmcs12->virtual_interrupt_delivery should be 0. In
current nVMX implementation, that results in
vmcs02->virtual_interrupt_delivery to also be 0. Thus,
vmcs02->eoi_exit_bitmap is not used. Therefore, every L2 EOI cause
a #VMExit into L0 (either on MSR_WRITE to x2APIC MSR or
APIC_ACCESS/APIC_WRITE/EPT_MISCONFIG to APIC MMIO page).
In order for such L2 EOI to be broadcasted, if needed, from LAPIC
to IO-APIC, vcpu->arch.ioapic_handled_vectors must be updated
while L2 is running. Therefore, patch makes sure to delay only the
loading of EOI-exitmap but not the update of
vcpu->arch.ioapic_handled_vectors.
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
vmx_save_host_state() is only called from kvm_arch_vcpu_ioctl_run() so
the context is pretty well defined and as we're past 'swapgs' MSR_GS_BASE
should contain kernel's GS base which we point to irq_stack_union.
Add new kernelmode_gs_base() API, irq_stack_union needs to be exported
as KVM can be build as module.
Acked-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
vmx_save_host_state() is only called from kvm_arch_vcpu_ioctl_run() so
the context is pretty well defined. Read MSR_{FS,KERNEL_GS}_BASE from
current->thread after calling save_fsgs() which takes care of
X86_BUG_NULL_SEG case now and will do RD[FG,GS]BASE when FSGSBASE
extensions are exposed to userspace (currently they are not).
Acked-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Allow to disable pause loop exit/pause filtering on a per VM basis.
If some VMs have dedicated host CPUs, they won't be negatively affected
due to needlessly intercepted PAUSE instructions.
Thanks to Jan H. Schönherr's initial patch.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
If host CPUs are dedicated to a VM, we can avoid VM exits on HLT.
This patch adds the per-VM capability to disable them.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Allowing a guest to execute MWAIT without interception enables a guest
to put a (physical) CPU into a power saving state, where it takes
longer to return from than what may be desired by the host.
Don't give a guest that power over a host by default. (Especially,
since nothing prevents a guest from using MWAIT even when it is not
advertised via CPUID.)
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Access to VMware backdoor ports is done by one of the IN/OUT/INS/OUTS
instructions. These ports must be allowed access even if TSS I/O
permission bitmap don't allow it.
To handle this, VMX/SVM will be changed in future commits
to intercept #GP which was raised by such access and
handle it by calling x86 emulator to emulate instruction.
If it was one of these instructions, the x86 emulator already handles
it correctly (Since commit "KVM: x86: Always allow access to VMware
backdoor I/O ports") by not checking these ports against TSS I/O
permission bitmap.
One may wonder why checking for specific instructions is necessary
as we can just forward all #GPs to the x86 emulator.
There are multiple reasons for doing so:
1. We don't want the x86 emulator to be reached easily
by guest by just executing an instruction that raises #GP as that
exposes the x86 emulator as a bigger attack surface.
2. The x86 emulator is incomplete and therefore certain instructions
that can cause #GP cannot be emulated. Such an example is "INT x"
(opcode 0xcd) which reaches emulate_int() which can only emulate
the instruction if vCPU is in real-mode.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Next commits are going introduce support for accessing VMware backdoor
ports even though guest's TSS I/O permissions bitmap doesn't allow
access. This mimic VMware hypervisor behavior.
In order to support this, next commits will change VMX/SVM to
intercept #GP which was raised by such access and handle it by calling
the x86 emulator to emulate instruction. Since commit "KVM: x86:
Always allow access to VMware backdoor I/O ports", the x86 emulator
handles access to these I/O ports by not checking these ports against
the TSS I/O permission bitmap.
However, there could be cases that CPU rasies a #GP on instruction
that fails to be disassembled by the x86 emulator (Because of
incomplete implementation for example).
In those cases, we would like the #GP intercept to just forward #GP
as-is to guest as if there was no intercept to begin with.
However, current emulator code always queues #UD exception in case
emulator fails (including disassembly failures) which is not what is
wanted in this flow.
This commit addresses this issue by adding a new emulation_type flag
that will allow the #GP intercept handler to specify that it wishes
to be aware when instruction emulation fails and doesn't want #UD
exception to be queued.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add kvm_fast_pio() to consolidate duplicate code in VMX and SVM.
Unexport kvm_fast_pio_in() and kvm_fast_pio_out().
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Nested Hyper-V/Windows guest running on top of KVM will use TSC page
clocksource in two cases:
- L0 exposes invariant TSC (CPUID.80000007H:EDX[8]).
- L0 provides Hyper-V Reenlightenment support (CPUID.40000003H:EAX[13]).
Exposing invariant TSC effectively blocks migration to hosts with different
TSC frequencies, providing reenlightenment support will be needed when we
start migrating nested workloads.
Implement rudimentary support for reenlightenment MSRs. For now, these are
just read/write MSRs with no effect.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add the EventSelect and Counter MSRs for AMD Core Perf Extension.
Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This patch introduces kvm_para_has_hint() to query for hints about
the configuration of the guests. The first hint KVM_HINTS_DEDICATED,
is set if the guest has dedicated physical CPUs for each vCPU (i.e.
pinning and no over-commitment). This allows optimizing spinlocks
and tells the guest to avoid PV TLB flush.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|