AArch64 LLVM Crash: Illegal Type Copying With __SVCount_t

by Admin 58 views
AArch64 LLVM Crash: Illegal Type Copying with `__SVCount_t`

Introduction

Hey guys! Today, we're diving into a tricky issue encountered in the LLVM compiler when targeting AArch64 architectures, specifically the Neoverse-V1 CPU. The problem manifests as a crash during the code generation phase, with a rather cryptic assertion failure related to illegal type copying. If you're working with LLVM and AArch64, especially with Scalable Vector Extension (SVE) types like __SVCount_t, this might be relevant to you. Let's break down the error, understand the context, and explore potential causes.

The Problem: Assertion Failure

The core of the issue is an assertion failure within LLVM's code generation pipeline. The error message is quite explicit:

Assertion `DAG.getTargetLoweringInfo().isTypeLegal(PartVT) && "Copying to an illegal type!"` failed.

This assertion occurs in the getCopyToParts function, which is part of the SelectionDAGBuilder. The SelectionDAG (Selection Directed Acyclic Graph) is a crucial data structure in LLVM's backend, representing the program's operations in a way that's suitable for instruction selection. The assertion essentially means that the compiler is trying to copy data to a type (PartVT) that the target architecture (AArch64 in this case) deems illegal. But what does "illegal" mean here?

In the context of LLVM, a type is considered legal if the target architecture can directly operate on it. This usually means that the type has a corresponding hardware representation and instructions to manipulate it. If a type is not legal, the compiler needs to decompose it into smaller, legal types and perform operations on those smaller parts. The error suggests that the compiler is failing to find a legal decomposition for the type it's trying to copy.

Context: __SVCount_t and Scalable Vector Extension (SVE)

The provided code snippet gives us a vital clue:

void f(__SVCount_t, __SVCount_t);

void foo() { f(__SVCount_t(), __SVCount_t()); }

This code uses the __SVCount_t type, which is part of the Scalable Vector Extension (SVE) in AArch64. SVE is an extension that allows vector lengths to be determined at runtime, making code more adaptable to different hardware configurations. The __SVCount_t type represents a vector of counts, used for controlling the operation of SVE instructions. Now, here's where the problem likely lies: the compiler might not be correctly handling __SVCount_t for the Neoverse-V1 CPU, particularly when it comes to passing it as an argument to a function.

The Backtrace: A Journey Through the Compiler

The backtrace provides a call stack that shows the sequence of function calls leading to the crash. Let's highlight the key parts:

  1. getCopyToParts: This is where the assertion fails, as we already know.
  2. llvm::TargetLowering::LowerCallTo: This function is responsible for lowering a call instruction into a sequence of SelectionDAG nodes that represent the actual machine instructions needed to perform the call. This involves handling argument passing, return value handling, and other call-related details.
  3. llvm::SelectionDAGBuilder::lowerInvokable and llvm::SelectionDAGBuilder::LowerCallTo: These functions are part of the process of building the SelectionDAG for a function. They handle the details of lowering call instructions into the DAG.
  4. llvm::SelectionDAGBuilder::visitCall: This function is called when the SelectionDAGBuilder encounters a call instruction in the LLVM Intermediate Representation (IR).
  5. llvm::SelectionDAGISel::SelectBasicBlock and llvm::SelectionDAGISel::SelectAllBasicBlocks: These functions are part of the instruction selection phase, where the SelectionDAG is transformed into machine instructions.

In essence, the backtrace tells us that the crash occurs during the process of lowering a call instruction that involves __SVCount_t arguments. The compiler is unable to find a legal way to copy the __SVCount_t values, leading to the assertion failure.

Reproducing the Bug

The reproducer provided is invaluable:

void f(__SVCount_t, __SVCount_t);

void foo() { f(__SVCount_t(), __SVCount_t()); }

And the command to trigger the crash:

clang++ -c -mcpu=neoverse-v1 bug.cpp

This command compiles the bug.cpp file into an object file, targeting the Neoverse-V1 CPU. The -c flag tells clang++ to compile but not link. The -mcpu=neoverse-v1 flag is crucial, as it specifies the target CPU architecture. The fact that this command consistently crashes indicates a bug in the LLVM backend for AArch64, specifically when handling __SVCount_t on Neoverse-V1.

Possible Causes and Debugging Strategies

So, what could be causing this issue? Here are some possibilities:

  1. Missing or Incorrect TargetLowering Information: The TargetLoweringInfo class provides information about how to lower LLVM IR constructs to machine code for a specific target. It's possible that the TargetLoweringInfo for AArch64 Neoverse-V1 is missing or incorrect information about how to handle __SVCount_t.
  2. Incorrect Type Legalization: The compiler might be failing to correctly legalize the __SVCount_t type. This could be due to a bug in the type legalization code or missing information about the type's properties.
  3. ABI Issues: The Application Binary Interface (ABI) defines how arguments are passed to functions. It's possible that the ABI for AArch64 Neoverse-V1 is not correctly specifying how __SVCount_t should be passed, leading to the compiler trying to copy it in an illegal way.
  4. SVE Support Issues: There might be general issues with SVE support in the LLVM backend for AArch64. This could be due to incomplete or incorrect implementations of SVE-related instructions or data types.

To debug this issue, you could try the following:

  • Examine the TargetLoweringInfo: Inspect the TargetLoweringInfo class for AArch64 Neoverse-V1 to see how it handles __SVCount_t. Look for any missing or incorrect information.
  • Trace Type Legalization: Trace the type legalization process to see how the compiler is trying to decompose __SVCount_t. Look for any errors or unexpected behavior.
  • Analyze the Generated DAG: Examine the SelectionDAG generated for the foo function to see how the __SVCount_t arguments are being handled. Look for any nodes that seem incorrect or out of place.
  • Simplify the Code: Try simplifying the reproducer code to see if you can isolate the specific part of the code that's causing the crash.
  • Update LLVM: Ensure you are using the latest version of LLVM. Bug fixes are frequently released.

Reporting the Bug

The error message explicitly asks you to submit a bug report, and that's definitely the right thing to do. When reporting the bug, be sure to include:

  • The reproducer code.
  • The command to trigger the crash.
  • The full backtrace.
  • The version of LLVM you're using.
  • Any other relevant information, such as the target architecture and operating system.

This information will help the LLVM developers to quickly identify and fix the bug.

Conclusion

This AArch64 LLVM crash highlights the complexities of compiler development, especially when dealing with advanced features like SVE. The assertion failure related to illegal type copying points to a potential issue in how the compiler handles __SVCount_t on the Neoverse-V1 CPU. By understanding the error, examining the backtrace, and exploring possible causes, we can gain valuable insights into the inner workings of the compiler and contribute to fixing the bug. If you encounter this issue, be sure to report it to the LLVM developers with as much detail as possible. Happy debugging!