Recommended talks
Contents
Recommended talks¶
I watch a lot of talks about programming languages, paradigms, tools, compilers, etc., and decided I would start keeping track of the talks I thought were particularly good.
Table of Contents¶
-
Using Modern CMake Patterns to Enforce a Good Modular Design - Mathieu Ropert [CppCon 2017]
Monoids, Monads and Applicative Functors: Repeated Software Patterns - David Sankel [CppCon 2020]
What Everyone Should Know About How Amazing Compilers Are - Matt Godbolt [C++ on Sea 2019]
When a Microsecond is an Eternity: High Performance Trading Systems in C++ - Carl Cook [CppCon 2017]
C / C++¶
Effective CMake - Daniel Pfeifer [C++Now 2017]¶
Using Modern CMake Patterns to Enforce a Good Modular Design - Mathieu Ropert [CppCon 2017]¶
Monoids, Monads and Applicative Functors: Repeated Software Patterns - David Sankel [CppCon 2020]¶
What Everyone Should Know About How Amazing Compilers Are - Matt Godbolt [C++ on Sea 2019]¶
YouTube link. Introduces and demos the website Explorer Compiler, including examples of where the compiler optimizes (architecture dependent) even seemingly esoteric C++ code, and also how to ensure the compiler optimizes when trying non trivial implementation. Provides great overview of basic x86-64 registers and operations.
Key notes:
compilers are cleverer than we are
do not compromise readability for e.g. performance
be aware of compiler limitations:
aliasing
use typing system
pass by value
avoid “raw” loops
visibility
“unknown” calls inhibit optimizations
[[gnu::pure]]
and[[gnu:const]]
(see this SO question)speculative devirtualisation
turn on link time optimization
structure layout
algorithms
compiler cannot save you from bad data layout or algorithms
When a Microsecond is an Eternity: High Performance Trading Systems in C++ - Carl Cook [CppCon 2017]¶
YouTube link. Discussing fast coding practices for economic application, although the idioms discussed are applicable anywhere; also notes the importance of measurement and cache v. cores. Also touches on the cache warming technique, and shortens to “Keep the hot path hot”.
Slowpath Removal:
Avoid
if (checkErrorA())
handleErrorA();
else if (checkErrorB())
handleErrorB();
else if (checkErrorC())
handleErrorC();
else
sendOrderToExchange();
Aim for
int64_t errorFlags;
// ...
if (!errorFlags)
sendOrderToExchange();
else
HandleError(errorFlags);
Ensure that error handling code is not inlined. Use the __attribute__
keywords with always_inline
and noinline
when appropriate:
__attribute__((noinline))
void ComplexLoggingFunction() {
// ...
}
Template based configurations:
Virtual functions and simple branches can be expensive; a possible solution is to use templated functions. Removes branches, eliminates code that won’t be executed. Example:
// 1st implementation
struct OrderSenderA {
void SendOrder() {
// ...
}
};
// 2nd implementation
struct OrderSenderB {
void SendOrder() {
// ...
}
};
template <typename T>
struct OrderManager : public IOrderManager {
void MainLoop() final {
// ...
mOrderSender.sendOrder();
}
T mOrderSender;
};
Then use factories to parse configurations
std::unique_ptr<IOrderManager> Factory(const Config& config) {
if (config.UseOrderSenderA())
return std::make_unique<OrderManager<UseOrderSenderA>>();
else
return std::make_unique<OrderManager<UseOrderSenderB>>();
}
int main(int argc, char* argv[]) {
auto manager = Factory(config);
manager->MainLoop();
}
Lambda functions are fast and convenient
If you know at compile time which function will be executed, then prefer lambdas
template <typename T>
void SendMessage(T&& lambda) {
Msg msg = PrepareMessage();
lambda(msg);
send(msg);
}
With example lambda
SendMessage([&](auto& msg) {
msg.instrument = x;
msg.price = z;
});
Memory allocation
Allocations are costly; prefer a pool of preallocated objects. Reuse instead of deallocating. If you must delete large objects, consider using another thread (glibc free
has 400 lines of book-keeping code).
Use templates over branches Instead of using
if
/else
or ternary operators, prefer a templated approach:
template <Side T>
void Strategy<T>::RunStrategy() {
const float orderPrice = CalcPrice(fairValue, credit);
// ...
}
template<>
float Strategy<Side::Buy>::CalcPrice(float value, float credit) {
return value - credit;
}
template<>
float Strategy<Side::Sell>::CalcPrice(float value, float credit) {
return value + credit;
}
Multithreading
Avoid for latency-sensitive code; synchronization of data via locking will get expensive, or lock-free code may require hardware locks.
If using multiple thread, keep shared data to absolute minimum. Consider passing data copies over sharing. If data must be shared, and out-of-sequence updates are acceptable, consider not synchronizing.
Data lookups
If the cache-line is 64 bits, adjust datastructures so commonly read values are ordered together – in the following, looking up price
gives quantityMultiplier
for free: denormalized data is not a sin.
struct Instrument {
float price;
int16_t quantityMultiplier;
// ...
}
Storing the same value in two places is not always bad practice.
Keep the cache hot
Always execute to the last point of the hot-path to keep the cache warm, and as a bonus, to train the hardware branch predictor correctly. If possible, don’t share L3, disable all but a single core (or lock the cache) – if you do have multiple cores, choose neighbours carefully. Noisy neighbours may be moved to a different physical CPU if possible.
Measurement
Tools: sampling profiler gprof
, instrumentation profilers callgrind
, microbenchmarks Google Benchmark, etc., all have their limitations; they are useful, but not for micro-optimization. Instead, try to as closely as you can model your server configuration and test on that.
Summary:
Aim for very simple runtime logic, compilers optimize simple code the best. Prefer approximations over precision where appropriate; do the expensive work only when you have spare time. Conduct accurate measurements.
MongoDB¶
Data Modeling with MongoDB - Yulia Genkina [MongoDB 2020]¶
YouTube link. I’ve covered this talk in detail in my MonogDB notes.
Go¶
Building a container from scratch in Go - Liz Rice [Container Camp 2016]¶
YouTube link. How container runtimes work, from an effective level.
Demonstrates how you can use system flags to create namespaces, virtual directory structures, and process trees
Live coding examples in GO
A very good talk for a solid understanding of how container runtimes work.
Paradigms¶
The Forgotten Art of Structured Programming - Kevlin Henney [C++ on Sea 2019]¶
YouTube link. On keeping code clean, factorisable, and the importance of intelligent control flow.
A cool (but legacy) code example demonstrating multiple entry points of a while
directive
send(to, from, count)
register short *to, *from;
register count;
{
register n = (count + 7) / 8;
switch (count % 8) {
case 0: do { *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
}
Key notes:
don’t use
goto
block structure is a great organizational tool