dynarmic/externals/catch/docs/benchmarks.md

252 lines
11 KiB
Markdown
Raw Permalink Normal View History

<a id="top"></a>
# Authoring benchmarks
> [Introduced](https://github.com/catchorg/Catch2/issues/1616) in Catch2 2.9.0.
Writing benchmarks is not easy. Catch simplifies certain aspects but you'll
always need to take care about various aspects. Understanding a few things about
the way Catch runs your code will be very helpful when writing your benchmarks.
First off, let's go over some terminology that will be used throughout this
guide.
- *User code*: user code is the code that the user provides to be measured.
- *Run*: one run is one execution of the user code. Sometimes also referred
to as an _iteration_.
- *Sample*: one sample is one data point obtained by measuring the time it takes
to perform a certain number of runs. One sample can consist of more than one
run if the clock available does not have enough resolution to accurately
measure a single run. All samples for a given benchmark execution are obtained
with the same number of runs.
## Execution procedure
Now I can explain how a benchmark is executed in Catch. There are three main
steps, though the first does not need to be repeated for every benchmark.
1. *Environmental probe*: before any benchmarks can be executed, the clock's
resolution is estimated. A few other environmental artifacts are also estimated
at this point, like the cost of calling the clock function, but they almost
never have any impact in the results.
2. *Estimation*: the user code is executed a few times to obtain an estimate of
the amount of runs that should be in each sample. This also has the potential
effect of bringing relevant code and data into the caches before the actual
measurement starts.
3. *Measurement*: all the samples are collected sequentially by performing the
number of runs estimated in the previous step for each sample.
This already gives us one important rule for writing benchmarks for Catch: the
benchmarks must be repeatable. The user code will be executed several times, and
the number of times it will be executed during the estimation step cannot be
known beforehand since it depends on the time it takes to execute the code.
User code that cannot be executed repeatedly will lead to bogus results or
crashes.
## Benchmark specification
Benchmarks can be specified anywhere inside a Catch test case.
There is a simple and a slightly more advanced version of the `BENCHMARK` macro.
Let's have a look how a naive Fibonacci implementation could be benchmarked:
```c++
std::uint64_t Fibonacci(std::uint64_t number) {
return number < 2 ? 1 : Fibonacci(number - 1) + Fibonacci(number - 2);
}
```
Now the most straight forward way to benchmark this function, is just adding a `BENCHMARK` macro to our test case:
```c++
TEST_CASE("Fibonacci") {
CHECK(Fibonacci(0) == 1);
// some more asserts..
CHECK(Fibonacci(5) == 8);
// some more asserts..
// now let's benchmark:
BENCHMARK("Fibonacci 20") {
return Fibonacci(20);
};
BENCHMARK("Fibonacci 25") {
return Fibonacci(25);
};
BENCHMARK("Fibonacci 30") {
return Fibonacci(30);
};
BENCHMARK("Fibonacci 35") {
return Fibonacci(35);
};
}
```
There's a few things to note:
- As `BENCHMARK` expands to a lambda expression it is necessary to add a semicolon after
the closing brace (as opposed to the first experimental version).
- The `return` is a handy way to avoid the compiler optimizing away the benchmark code.
Running this already runs the benchmarks and outputs something similar to:
```
-------------------------------------------------------------------------------
Fibonacci
-------------------------------------------------------------------------------
C:\path\to\Catch2\Benchmark.tests.cpp(10)
...............................................................................
Squashed 'externals/catch/' changes from ab6c7375b..53d0d913a 53d0d913a v3.5.0 1648c30ec Look just for 'Catch2 X.Y.Z' in doc placeholder update d4e9fb8aa Highlight that SECTIONs rerun the entire test case from beginning (#2749) b606bc280 Remove obsolete section in limitations.md 4ab0af8ba Fix minor typos in documentation (#2769) b7d70ddcd Ensure we always read 32 bit seed from std::random_device a6f22c516 Remove static instance of std::random_device in Benchmark::analyse_samples 1887d42e3 Use our PCG32 RNG instead of mt19937 in Benchmark::analyse_samples 1774dbfd5 Make it clearer that the JSON reporter is WIP cb07ff9a7 Fix uniform_floating_point_distribution for unit ranges ae4fe16b8 Make the user-facing random Generators reproducible 28c66fdc5 Make uniform_floating_point_distribution reproducible ed9d672b5 Add uniform_integer_distribution 04a829b0e Add helpers for implementing uniform integer distribution ab1b079e4 Add uniform_floating_point_distribution d139b4ff7 Add implementation of helpers for uniform float distribution bfd9f0f5a Move nextafter polyfill to polyfills.hpp 9a1e73568 Add test showing literals and complex generators in one GENERATE 21d2da23b Fix typo in tostring.md d1d7414eb Always run apt-get update before apt-get install dacbf4fd6 Drop VS 2017 support 0520ff443 [DOC] Replaced broken link (fixes #2770) 4a7be16c8 Fix compilation on Xbox platforms 32d9ae24b JSONWriter deals in StringRefs instead of std::strings de7ba4e88 fn need to be in parenthesis 733b901dd Fix special character escaping in JsonWriter 7bf136b50 Add JSON reporter (#2706) 2c68a0d05 lifted suggested version 01cac90c6 Bump up actions/checkout version to v4 (#2760) b735dfce2 Increase build parallelism on macOS (#2759) caffe79a3 Fix missing include in catch_message.hpp a8cf3e671 Mark `CATCH_CONFIG_` options as advanced 79d39a195 Fix tests for C++23's multi-arg index operator 6ebc013b8 Fix UDL definitions for C++23 966d36155 Improve formatting of test specification docs 766541d12 why-catch.md: Add JetBrains survey link 7b793314e Catch.cmake: Support CMake multi-config with PRE_TEST discovery mode (#2739) 0fb817e41 fix some bugprone-macro-parentheses warnings f161110be Merge pull request #2747 from xandox/devel db495acdb correct argument references in CatchAddTests.cmake 9c541ca72 Add test for multiple streaming parts in UNSCOPED_INFO 92672591c Make jackknife TU-local to stats.cpp 56fcd584c Make directCompare TU-local to stats.cpp aafe09bc1 Update meson.build to fix #2722 (#2742) 47a2c9693 Reduce the number of templates in Benchmarking fb96279ae Remove superfluous stdlib includes from catch_benchmark.hpp e14a08d73 Remove unused typedef from Benchmark::Environment 9bba07cb8 Replace vector iterator args in benchmarks with ptr args b4ffba508 Update sample output in docs/benchmarks.md 3a5cde55b implement stringify for std::nullopt_t 2a19ae16b Rewrite commandline test spec docs f24d39e42 Support C arrays and ADL ranges in from_range generator 85eb4652b Add nice license headers to files in examples/ and fuzzing/ 5bba3e403 Edited amalgamated file generator, to block REUSE from getting confused e09de7222 Small cleanup in XML reporter a64ff326b Change 'estimated' to 'est run time' in console reporter output ad5646347 Flush stream after benchmarkStarting in ConsoleReporter 9538d1600 Mention missing catch_user_config.hpp in FAQ a94bee771 Add missing line for v3.4.0 to ToC in release-notes.md d7304f0c4 Constify section hints in static-analysis mode cd60a0301 Assert Info reset need to also reset result disposition to normal to handle uncaught exception correctly (#2723) b593be211 Always default empty destructors ed4acded3 Don't define tryTranslators function if exception are disabled 4acc51828 Introduce CATCH_CONFIG_PREFIX_MESSAGES to only prefix a few logging related macros. (#2544) 6e79e682b v3.4.0 683c85772 Clean up explanation in tests 1b049bdba 2 more TEST_CASEs to DiscoverTests/register-tests.cpp e4b16053a Escape Catch2 test names in catch_discover_tests tests 42ee66b5e Fix handling of semicolon and backslash characters in CMake test discovery (#2676) a0c6a2846 Fix possible FP in catch_discover_tests tests c8363143e Add test scaffolding for catch_discover_tests 7a52dfa77 Fix typo in cross-docs links 913173663 Bazel support: Update skylib 0631b607e Test & document SKIP in generator constructor dff7513b2 Static analysis cleanup in tests bf5aa7b38 Experimental static analysis support in TEST_CASE and SECTION dba9197ec Add new config option: STATIC_ANALYSIS_SUPPORT f60c15364 Add macro for suppressing Wshadow b3cf1bfb5 Avoid unused variable warning in GeneratorsImpl tests 73b93ce6b Include catch_user_config.hpp in all catch_config_* files 8008625d7 Merge pull request #2693 from Ali-Amir/u/ali/optional-meson-unit-tests ce7b15302 Add option to disable building unit tests in Meson build file. 535205e2a Suppress -Wunused-result warning in gcc 689fdcd7d Fix some tests never being run a153fce72 Improve error messages for TEST_CASE tag parsing errors 06c0e1cfa Merge pull request #2689 from ThePhD/fix/includes/header-exception 05d7eb5a0 🛠 Add <exception> header where strictly necessary f53bb3ae7 meson: require version >=0.54.1 ce8a7b339 Merge pull request #2687 from ChrisThrasher/sfml 6dce539fa Add SFML to the list of open source users 5a40b2275 Update CatchConfigOptions.cmake 598895d04 Fix Wredundant-decls 0dc82e08d Move CATCH_INTERNAL_STRINGIFY macro into its own header 8ca504cbc Move AssertionResult when passing it inside RunContext c57b5cdf4 Move-enable Catch::optional d84777c9c Fix assertionStarting events being sent after the expr is evaluated 51fdbedd1 Internal linkage for outlier_variance 10f0a5864 Some template instantiation reductions fe64c2892 Reduce compilation costs of benchmarks 7d07efc92 Clean up iterator usage in benchmarks f3c678c0a Constexprify constants in estimate_clock.hpp 46539b6d9 Fix spelling 10596b227 Fix unreachable-code-return warnings 897fe2a01 cmake: Improve unreachable-code warnings aad926baf Catch.cmake: Add new DISCOVERY_MODE option to catch_discover_tests 4e8399d83 CatchAddTests.cmake: Refactor into callable method 9a2a4eadc Bump xml-format-version in XML reporter fb806da76 Add lineinfo to XML reporter output for INFO/WARN 50bf00e26 Fix reporter detection in catch_discover_tests 9f08097f5 Cleanup internal includes by splitting out some event structs 1f881ab46 Split ITestInvoker into its own header c487b27d9 Reduce misc includes all around 3230760db Cleanup in translating exceptions to messages b3ebce715 Cleanup benchmarking includes d0f70fdfd Unify IReporterRegistry and ReporterRegistry 4f4ad8ada Sprinkle some constexpr around 5b665be64 Cut out catch_interfaces_capture.hpp include from the main include 2598116aa Mark various anonymous classes final 173aa3f1f Devirtualize Context 28437e121 Remove pointless member variable from RunContext 3c8fb6bbb Internal linkage for generator trackers 72f3ce4db Outline the actual registering of listener factories to cpp file 62167d756 Reduce internal includes 678341134 Fixed extras installation and shard impl location 7b4dd326c Remove obsolete comment in multireporter 1dfaa8abe Outline throwing of TestSkipException ba94278bd Inline trivial function in AssertionHandler 8e5a4b6f7 Remove superfluous pointer copy in AssertionStats constructor 9b884d810 Fix refactoring 8a1b3b81c Add wxWidgets as another Open Source project using Catch e5aabb671 Add xmlwrapp to the list of Open Source projects using Catch 3a1ef1409 Use hasMessage() instead of getMessage().empty() 13fae1e2f Move exception's translation into AssertionResultData message 3220ae6d4 Add support for the IAR compiler 0a0ebf500 Support elements without op!= in VectorEquals 69f35a5ac Bazel support: Update skylib version 3f0283de7 v3.3.2 6fbb3f072 Add IsNaN matcher 9ff3cde87 Simplify test name creation for list-templated test cases 4d802ca58 Use StringRef UDL in more preprocessor-generated strings 13711be7c Use StringRef UDL for generated generator names 27ba26f74 Merge pull request #2643 from kisielk/patch-1 a209bcfb5 Update build instructions in contributing.md 584973a48 Early evaluate line loc in NameAndLoc::operator== 4f7c8cb28 Avoid copying NameAndLocationRef when passed as argument e1dbad4c9 Inline StringRef::operator== 2befd98da Inline some non-virtual functions in ITracker and TrackerContext 00f259aeb Move captured output into TestCaseStats when sending testCaseEnded fed143624 Avoid allocating trimmed name for SectionTracker 0477326ad Directly construct empty string for invalid SectionInfo f04c93462 Small refactoring in AssertionResult 1af351cea Remove unused TrackerContext::endRun function dcc9fa3f3 Use StringRef UDL for more string literals when expanding macros bf6a15a69 Rewrite -# docs 6135a78c3 Don't insert the foo part of [.foo] tag twice when parsing test spec e8ba329b6 Add support for iterator+sentinel pairs in Contains matcher 4aa88299a Preconstruct error message in RunContext::handleIncomplete 4ff9be3bc cmake-integration.md: Use "tests" as test target name in all examples. 76cdaa3b5 Merge pull request #2637 from jbadwaik/nvhpc_unused_warning 644294df6 Suppress declared_but_not_referenced warning for NVHPC cefa8fcf3 Enable use of UnorderedRangeEquals with iterator+sentinel pairs 772fa3f79 Add Catch::Detail::is_permutation that supports sentinels f3c0a3cd0 Fix RangeEquals matcher to handle iterator + sentinel ranges 42d9d4533 Add test for empty result of filter generator 618d44c44 Update docs about thread safe assertions 388f7e173 Cleanup unneeded allocations from reporters 2ab20a0e0 v3.3.1 60264b880 Avoid copying strings in sonarqube when sorting tests by file 65ffee518 Don't take ownership of SECTION's name for inactive sections 43f02027e Avoid allocations when looking for trackers 906552f8c Clean up extraneous copies in Messages 356dfc143 Move name and sample analysis in benchmarks into BenchmarkStats e5d1eb757 Move AssertionResultData into AssertionResult in RunContext 2403f5620 Move SectionEndInfo into sectionEnded call in SECTION's destructor d58491c85 Move sectionInfo into sectionEndInfo when SECTION ends c837cb4a8 v3.3.0 8359a6b24 Stop exceptions in generator constructors from aborting the binary adf43494e Add missing version information to matchers.md efca9a0f1 Added ElementsAre and UnorderedElementsAre (#2377) dd36f83b8 Merge pull request #2630 from ChrisThrasher/export_all_symbols baab9e8d2 Export symbols for all compilers on Windows 2d3c9713a Remove VS2015 workaround from Detail::generate 956f915e3 Document template macros are in spearate header aa8da505e Fix compatibility with previous CUDA versions e27bb7198 Fix macro-redefinition issue with MSVC+CUDA 3486f8ed9 Update generator docs b5be64204 catch_debugger.hpp: restore PPC support (#2619) d59572f46 Reword the SKIP docs a bit 16f48f8c7 Add SUCCEED and FAIL docs next to SKIP docs 367c2cb24 Update doc about what counts as unique test case d548be26e Add new SKIP macro for skipping tests at runtime (#2360) 52066dbc2 Fix build with GCC 13 (add missing <cstdint> include) cdf604f30 Update command-line.md 04382af4c Slightly better clang-format ac93f1943 Improved path normalization in approvalTests.py 72b60dfd2 Cleanup the Windows GHA builds 0c62167fe Merge pull request #2604 from ChrisThrasher/generated_includes_directory 1be954ff7 Keep generated headers within project binary directory 78bb4fda0 Mention that the benchmarks are not run by default next to example e6ec1c238 Fix benchmarking example in the main readme 477c1f515 Fixed typo in code example in top level README.md f8b9f7725 Prune Appveyor builds 77fbacb03 Add VS 2019-2022 C+14/17 jobs to GHA e3fc97dff fix compiler warning in parseUint and catch only relevant exceptions (#2572) 9c0533a90 Add MessageMatches matcher for exception (#2570) ed02710b8 Make AutoReg in test registration macros const 8b84438be Avoid usage of master when possible git-subtree-dir: externals/catch git-subtree-split: 53d0d913a422d356b23dd927547febdf69ee9081
2023-12-31 06:00:46 +00:00
benchmark name samples iterations est run time
mean low mean high mean
std dev low std dev high std dev
-------------------------------------------------------------------------------
Fibonacci 20 100 416439 83.2878 ms
2 ns 2 ns 2 ns
0 ns 0 ns 0 ns
Fibonacci 25 100 400776 80.1552 ms
3 ns 3 ns 3 ns
0 ns 0 ns 0 ns
Fibonacci 30 100 396873 79.3746 ms
17 ns 17 ns 17 ns
0 ns 0 ns 0 ns
Fibonacci 35 100 145169 87.1014 ms
468 ns 464 ns 473 ns
21 ns 15 ns 34 ns
```
### Advanced benchmarking
The simplest use case shown above, takes no arguments and just runs the user code that needs to be measured.
However, if using the `BENCHMARK_ADVANCED` macro and adding a `Catch::Benchmark::Chronometer` argument after
the macro, some advanced features are available. The contents of the simple benchmarks are invoked once per run,
while the blocks of the advanced benchmarks are invoked exactly twice:
once during the estimation phase, and another time during the execution phase.
```c++
BENCHMARK("simple"){ return long_computation(); };
BENCHMARK_ADVANCED("advanced")(Catch::Benchmark::Chronometer meter) {
set_up();
meter.measure([] { return long_computation(); });
};
```
These advanced benchmarks no longer consist entirely of user code to be measured.
In these cases, the code to be measured is provided via the
`Catch::Benchmark::Chronometer::measure` member function. This allows you to set up any
kind of state that might be required for the benchmark but is not to be included
in the measurements, like making a vector of random integers to feed to a
sorting algorithm.
A single call to `Catch::Benchmark::Chronometer::measure` performs the actual measurements
by invoking the callable object passed in as many times as necessary. Anything
that needs to be done outside the measurement can be done outside the call to
`measure`.
The callable object passed in to `measure` can optionally accept an `int`
parameter.
```c++
meter.measure([](int i) { return long_computation(i); });
```
If it accepts an `int` parameter, the sequence number of each run will be passed
in, starting with 0. This is useful if you want to measure some mutating code,
for example. The number of runs can be known beforehand by calling
`Catch::Benchmark::Chronometer::runs`; with this one can set up a different instance to be
mutated by each run.
```c++
std::vector<std::string> v(meter.runs());
std::fill(v.begin(), v.end(), test_string());
meter.measure([&v](int i) { in_place_escape(v[i]); });
```
Note that it is not possible to simply use the same instance for different runs
and resetting it between each run since that would pollute the measurements with
the resetting code.
It is also possible to just provide an argument name to the simple `BENCHMARK` macro to get
the same semantics as providing a callable to `meter.measure` with `int` argument:
```c++
BENCHMARK("indexed", i){ return long_computation(i); };
```
### Constructors and destructors
All of these tools give you a lot mileage, but there are two things that still
need special handling: constructors and destructors. The problem is that if you
use automatic objects they get destroyed by the end of the scope, so you end up
measuring the time for construction and destruction together. And if you use
dynamic allocation instead, you end up including the time to allocate memory in
the measurements.
To solve this conundrum, Catch provides class templates that let you manually
construct and destroy objects without dynamic allocation and in a way that lets
you measure construction and destruction separately.
```c++
BENCHMARK_ADVANCED("construct")(Catch::Benchmark::Chronometer meter) {
std::vector<Catch::Benchmark::storage_for<std::string>> storage(meter.runs());
meter.measure([&](int i) { storage[i].construct("thing"); });
};
BENCHMARK_ADVANCED("destroy")(Catch::Benchmark::Chronometer meter) {
std::vector<Catch::Benchmark::destructable_object<std::string>> storage(meter.runs());
for(auto&& o : storage)
o.construct("thing");
meter.measure([&](int i) { storage[i].destruct(); });
};
```
`Catch::Benchmark::storage_for<T>` objects are just pieces of raw storage suitable for `T`
objects. You can use the `Catch::Benchmark::storage_for::construct` member function to call a constructor and
create an object in that storage. So if you want to measure the time it takes
for a certain constructor to run, you can just measure the time it takes to run
this function.
When the lifetime of a `Catch::Benchmark::storage_for<T>` object ends, if an actual object was
constructed there it will be automatically destroyed, so nothing leaks.
If you want to measure a destructor, though, we need to use
`Catch::Benchmark::destructable_object<T>`. These objects are similar to
`Catch::Benchmark::storage_for<T>` in that construction of the `T` object is manual, but
it does not destroy anything automatically. Instead, you are required to call
the `Catch::Benchmark::destructable_object::destruct` member function, which is what you
can use to measure the destruction time.
### The optimizer
Sometimes the optimizer will optimize away the very code that you want to
measure. There are several ways to use results that will prevent the optimiser
from removing them. You can use the `volatile` keyword, or you can output the
value to standard output or to a file, both of which force the program to
actually generate the value somehow.
Catch adds a third option. The values returned by any function provided as user
code are guaranteed to be evaluated and not optimised out. This means that if
your user code consists of computing a certain value, you don't need to bother
with using `volatile` or forcing output. Just `return` it from the function.
That helps with keeping the code in a natural fashion.
Here's an example:
```c++
// may measure nothing at all by skipping the long calculation since its
// result is not used
BENCHMARK("no return"){ long_calculation(); };
// the result of long_calculation() is guaranteed to be computed somehow
BENCHMARK("with return"){ return long_calculation(); };
```
However, there's no other form of control over the optimizer whatsoever. It is
up to you to write a benchmark that actually measures what you want and doesn't
just measure the time to do a whole bunch of nothing.
To sum up, there are two simple rules: whatever you would do in handwritten code
to control optimization still works in Catch; and Catch makes return values
from user code into observable effects that can't be optimized away.
<i>Adapted from nonius' documentation.</i>