diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2018-01-30 11:15:14 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2018-01-30 11:15:14 -0800 |
commit | d8b91dde38f4c43bd0bbbf17a90f735b16aaff2c (patch) | |
tree | bd72dabf6e4b23e060fce429c87e60504f69de54 /tools/perf/bench/futex-wake-parallel.c | |
parent | 5e7481a25e90b661d1dbbba18be3fd3dfe12ec6f (diff) | |
parent | e4c1091cb495d9cbec8956d642644a71a1689958 (diff) | |
download | linux-d8b91dde38f4c43bd0bbbf17a90f735b16aaff2c.tar.gz linux-d8b91dde38f4c43bd0bbbf17a90f735b16aaff2c.tar.xz |
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- Clean up the x86 instruction decoder (Masami Hiramatsu)
- Add new uprobes optimization for PUSH instructions on x86 (Yonghong
Song)
- Add MSR_IA32_THERM_STATUS to the MSR events (Stephane Eranian)
- Fix misc bugs, update documentation, plus various cleanups (Jiri
Olsa)
There's a large number of tooling side improvements:
- Intel-PT/BTS improvements (Adrian Hunter)
- Numerous 'perf trace' improvements (Arnaldo Carvalho de Melo)
- Introduce an errno code to string facility (Hendrik Brueckner)
- Various build system improvements (Jiri Olsa)
- Add support for CoreSight trace decoding by making the perf tools
use the external openCSD (Mathieu Poirier, Tor Jeremiassen)
- Add ARM Statistical Profiling Extensions (SPE) support (Kim
Phillips)
- libtraceevent updates (Steven Rostedt)
- Intel vendor event JSON updates (Andi Kleen)
- Introduce 'perf report --mmaps' and 'perf report --tasks' to show
info present in 'perf.data' (Jiri Olsa, Arnaldo Carvalho de Melo)
- Add infrastructure to record first and last sample time to the
perf.data file header, so that when processing all samples in a
'perf record' session, such as when doing build-id processing, or
when specifically requesting that that info be recorded, use that
in 'perf report --time', that also got support for percent slices
in addition to absolute ones.
I.e. now it is possible to ask for the samples in the 10%-20% time
slice of a perf.data file (Jin Yao)
- Allow system wide 'perf stat --per-thread', sorting the result (Jin
Yao)
E.g.:
[root@jouet ~]# perf stat --per-thread --metrics IPC
^C
Performance counter stats for 'system wide':
make-22229 23,012,094,032 inst_retired.any # 0.8 IPC
cc1-22419 692,027,497 inst_retired.any # 0.8 IPC
gcc-22418 328,231,855 inst_retired.any # 0.9 IPC
cc1-22509 220,853,647 inst_retired.any # 0.8 IPC
gcc-22486 199,874,810 inst_retired.any # 1.0 IPC
as-22466 177,896,365 inst_retired.any # 0.9 IPC
cc1-22465 150,732,374 inst_retired.any # 0.8 IPC
gcc-22508 112,555,593 inst_retired.any # 0.9 IPC
cc1-22487 108,964,079 inst_retired.any # 0.7 IPC
qemu-system-x86-2697 21,330,550 inst_retired.any # 0.3 IPC
systemd-journal-551 20,642,951 inst_retired.any # 0.4 IPC
docker-containe-17651 9,552,892 inst_retired.any # 0.5 IPC
dockerd-current-9809 7,528,586 inst_retired.any # 0.5 IPC
make-22153 12,504,194,380 inst_retired.any # 0.8 IPC
python2-22429 12,081,290,954 inst_retired.any # 0.8 IPC
<SNIP>
python2-22429 15,026,328,103 cpu_clk_unhalted.thread
cc1-22419 826,660,193 cpu_clk_unhalted.thread
gcc-22418 365,321,295 cpu_clk_unhalted.thread
cc1-22509 279,169,362 cpu_clk_unhalted.thread
gcc-22486 210,156,950 cpu_clk_unhalted.thread
<SNIP>
5.638075538 seconds time elapsed
[root@jouet ~]#
- Improve shell auto-completion of perf events (Jin Yao)
- 'perf probe' improvements (Masami Hiramatsu)
- Improve PMU infrastructure to support amp64's ThunderX2
implementation defined core events (Ganapatrao Kulkarni)
- Various annotation related improvements and fixes (Thomas Richter)
- Clarify usage of 'overwrite' and 'backward' in the evlist/mmap
code, removing the 'overwrite' parameter from several functions as
it was always used it as 'false' (Wang Nan)
- Fix/improve 'perf record' reverse recording support (Wang Nan)
- Improve command line options documentation (Sihyeon Jang)
- Optimize sample parsing for ordering events, where we don't need to
parse all the PERF_SAMPLE_ bits, just the ones leading to the
timestamp needed to reorder events (Jiri Olsa)
- Generalize the annotation code to support other source information
besides objdump/DWARF obtained ones, starting with python scripts,
that will is slated to be merged soon (Jiri Olsa)
- ... and a lot more that I failed to list, see the shortlog and
changelog for details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (262 commits)
perf trace beauty flock: Move to separate object file
perf evlist: Remove fcntl.h from evlist.h
perf trace beauty futex: Beautify FUTEX_BITSET_MATCH_ANY
perf trace: Do not print from time delta for interrupted syscall lines
perf trace: Add --print-sample
perf bpf: Remove misplaced __maybe_unused attribute
MAINTAINERS: Adding entry for CoreSight trace decoding
perf tools: Add mechanic to synthesise CoreSight trace packets
perf tools: Add full support for CoreSight trace decoding
pert tools: Add queue management functionality
perf tools: Add functionality to communicate with the openCSD decoder
perf tools: Add support for decoding CoreSight trace data
perf tools: Add decoder mechanic to support dumping trace data
perf tools: Add processing of coresight metadata
perf tools: Add initial entry point for decoder CoreSight traces
perf tools: Integrating the CoreSight decoding library
perf vendor events intel: Update IvyTown files to V20
perf vendor events intel: Update IvyBridge files to V20
perf vendor events intel: Update BroadwellDE events to V7
perf vendor events intel: Update SkylakeX events to V1.06
...
Diffstat (limited to 'tools/perf/bench/futex-wake-parallel.c')
-rw-r--r-- | tools/perf/bench/futex-wake-parallel.c | 46 |
1 files changed, 35 insertions, 11 deletions
diff --git a/tools/perf/bench/futex-wake-parallel.c b/tools/perf/bench/futex-wake-parallel.c index b4732dad9f89..69d8fdc87315 100644 --- a/tools/perf/bench/futex-wake-parallel.c +++ b/tools/perf/bench/futex-wake-parallel.c @@ -7,7 +7,17 @@ * for each individual thread to service its share of work. Ultimately * it can be used to measure futex_wake() changes. */ +#include "bench.h" +#include <linux/compiler.h> +#include "../util/debug.h" +#ifndef HAVE_PTHREAD_BARRIER +int bench_futex_wake_parallel(int argc __maybe_unused, const char **argv __maybe_unused) +{ + pr_err("%s: pthread_barrier_t unavailable, disabling this test...\n", __func__); + return 0; +} +#else /* HAVE_PTHREAD_BARRIER */ /* For the CLR_() macros */ #include <string.h> #include <pthread.h> @@ -15,12 +25,11 @@ #include <signal.h> #include "../util/stat.h" #include <subcmd/parse-options.h> -#include <linux/compiler.h> #include <linux/kernel.h> #include <linux/time64.h> #include <errno.h> -#include "bench.h" #include "futex.h" +#include "cpumap.h" #include <err.h> #include <stdlib.h> @@ -42,8 +51,9 @@ static bool done = false, silent = false, fshared = false; static unsigned int nblocked_threads = 0, nwaking_threads = 0; static pthread_mutex_t thread_lock; static pthread_cond_t thread_parent, thread_worker; +static pthread_barrier_t barrier; static struct stats waketime_stats, wakeup_stats; -static unsigned int ncpus, threads_starting; +static unsigned int threads_starting; static int futex_flag = 0; static const struct option options[] = { @@ -64,6 +74,8 @@ static void *waking_workerfn(void *arg) struct thread_data *waker = (struct thread_data *) arg; struct timeval start, end; + pthread_barrier_wait(&barrier); + gettimeofday(&start, NULL); waker->nwoken = futex_wake(&futex, nwakes, futex_flag); @@ -84,6 +96,8 @@ static void wakeup_threads(struct thread_data *td, pthread_attr_t thread_attr) pthread_attr_setdetachstate(&thread_attr, PTHREAD_CREATE_JOINABLE); + pthread_barrier_init(&barrier, NULL, nwaking_threads + 1); + /* create and block all threads */ for (i = 0; i < nwaking_threads; i++) { /* @@ -96,9 +110,13 @@ static void wakeup_threads(struct thread_data *td, pthread_attr_t thread_attr) err(EXIT_FAILURE, "pthread_create"); } + pthread_barrier_wait(&barrier); + for (i = 0; i < nwaking_threads; i++) if (pthread_join(td[i].worker, NULL)) err(EXIT_FAILURE, "pthread_join"); + + pthread_barrier_destroy(&barrier); } static void *blocked_workerfn(void *arg __maybe_unused) @@ -119,19 +137,20 @@ static void *blocked_workerfn(void *arg __maybe_unused) return NULL; } -static void block_threads(pthread_t *w, pthread_attr_t thread_attr) +static void block_threads(pthread_t *w, pthread_attr_t thread_attr, + struct cpu_map *cpu) { - cpu_set_t cpu; + cpu_set_t cpuset; unsigned int i; threads_starting = nblocked_threads; /* create and block all threads */ for (i = 0; i < nblocked_threads; i++) { - CPU_ZERO(&cpu); - CPU_SET(i % ncpus, &cpu); + CPU_ZERO(&cpuset); + CPU_SET(cpu->map[i % cpu->nr], &cpuset); - if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpu)) + if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset)) err(EXIT_FAILURE, "pthread_attr_setaffinity_np"); if (pthread_create(&w[i], &thread_attr, blocked_workerfn, NULL)) @@ -205,6 +224,7 @@ int bench_futex_wake_parallel(int argc, const char **argv) struct sigaction act; pthread_attr_t thread_attr; struct thread_data *waking_worker; + struct cpu_map *cpu; argc = parse_options(argc, argv, options, bench_futex_wake_parallel_usage, 0); @@ -217,9 +237,12 @@ int bench_futex_wake_parallel(int argc, const char **argv) act.sa_sigaction = toggle_done; sigaction(SIGINT, &act, NULL); - ncpus = sysconf(_SC_NPROCESSORS_ONLN); + cpu = cpu_map__new(NULL); + if (!cpu) + err(EXIT_FAILURE, "calloc"); + if (!nblocked_threads) - nblocked_threads = ncpus; + nblocked_threads = cpu->nr; /* some sanity checks */ if (nwaking_threads > nblocked_threads || !nwaking_threads) @@ -259,7 +282,7 @@ int bench_futex_wake_parallel(int argc, const char **argv) err(EXIT_FAILURE, "calloc"); /* create, launch & block all threads */ - block_threads(blocked_worker, thread_attr); + block_threads(blocked_worker, thread_attr, cpu); /* make sure all threads are already blocked */ pthread_mutex_lock(&thread_lock); @@ -297,3 +320,4 @@ int bench_futex_wake_parallel(int argc, const char **argv) free(blocked_worker); return ret; } +#endif /* HAVE_PTHREAD_BARRIER */ |