Introducing the /forceInterlockedFunctions switch for ARM64

In Visual Studio 2022 17.14, we are introducing the /forceInterlockedFunctions[-] switch, which generates and links with out-of-line atomics that select Armv8.1+ Large System Extension (LSE) atomic instructions based on CPU support.

This switch is on by default for Armv8.0 and off for Armv8.1+. Outlining is necessary in Armv8.0 because this version’s interlocked intrinsics use exclusive instructions—LoadExcl/StoreExcl—that do not guarantee forward progress. This can cause performance issues due to intermittent livelocks. See Arm Architecture Reference Manual for A-profile architecture, section “B2.17.5 Load-Exclusive and Store-Exclusive instruction usage restrictions” for examples of when the LoadExcl/StoreExcl loop may not make forward progress.

Below is an example of code that was previously generated when using the _InterlockedAdd64 intrinsic. You can see the ldaxr and stlxr instructions being used.

Main.cpp Main.asm snippet
#include <intrin.h>
#include <stdio.h>
#include <Windows.h>

void main() {
    volatile __int64 Addend = 5;
    __int64 Value = 1; _InterlockedAdd64(&Addend, Value);
}
; _InterlockedAdd64(&Addend, Value);
ldr x10,[sp]
add x9,sp,#8
|$LN3@main|
ldaxr x8,[x9]
add x8,x8,x10
stlxr wip0,x8,[x9]
cbnz wip0,|$LN3@main|
dmb ish

With the /forceInterlockedFunctions option you can see that the ldaxr and stlxr are gone and have been replaced with a bl _InterlockedAdd64 instruction

Main.cpp Main.asm snippet
#include <intrin.h>
#include <stdio.h>
#include <Windows.h>

void main() {
    volatile __int64 Addend = 5;
    __int64 Value = 1; _InterlockedAdd64(&Addend, Value);
}
; InterlockedAdd64(&Addend, Value);
ldr x1,[sp,#0x10]
add x0,sp,#0x18
bl _InterlockedAdd64
nop

The /forceInterlockedFunctions option only applies to Arm64 and will be ignored if not applicable. Additionally, enabling the LSE feature will override the default outlining behavior in Armv8.0.

Note that the option is on by default for all Arm64EC versions. We would not recommend turning the option off for ARM64EC, as outlining helps address the memory model differences between Arm64 and x64.

 

This flag impacts the following interlocked intrinsics:

Key:

  • Full: supports plain, _acq_rel, and _nf forms.
  • None: Not supported
Operation 8 16 32 64 128 Pointer
Add None None Full Full None None
And Full Full Full Full None None
CompareExchange Full Full Full Full Full Full
Decrement None Full Full Full None None
Exchange Full Full Full Full None Full
ExchangeAdd Full Full Full Full None None
Increment None Full Full Full None None
Or Full Full Full Full None None
Xor Full Full Full Full None None
bittestandset None None Full Full None None
bittestandreset None None Full Full None None

See also

/forceInterlockedFunctions | Microsoft Learn

ARM64 intrinsics | Microsoft Learn

/feature (ARM64) | Microsoft Learn

Introduction to Large System Extensions | Arm Learning Paths

Feedback

That’s all about this new compiler option and default setting that you can find starting in Visual Studio 2022 version 17.14. Please give it a try and let us know how it goes! We always welcome feedback, questions, or concerns from the community, as it helps make Visual Studio better.

Please share your thoughts, comments and questions with us through Developer Community. You can also reach us on X @VisualC, or via email at visualcpp@microsoft.com.

 

The post Introducing the /forceInterlockedFunctions switch for ARM64 appeared first on C++ Team Blog.

Previous Article

How Office Is Modernizing Their App Suite’s UI using Windows App SDK and React Native

Next Article

🎉 Visual Studio 2022 v17.14 is now generally available!

Write a Comment

Leave a Comment

Your email address will not be published. Required fields are marked *