AutoVectorization (SuperWord) Status
I present this graphical overview of the C2 AutoVectorizer (SuperWord). The goal is to help myself and others understand the different improvements and their dependencies.
Related blog post: C2 AutoVectorizer Improvement Ideas.
January 2025
- Integrated: JDK-8343685 C2 SuperWord: refactor VPointer with MemPointer
- It generalizes the pointer parsing, and allows more patterns to be analyzed for static aliasing analysis (adjacency and overlap queries).
- In Review: JDK-8323582 C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory
- Runtime Check for alignment check when using
-XX:+AlignVector
(platforms that require strict alignment). Arrays, like all Java Objects, areObjectAlignmentInBytes
aligned (usually 8 byte alignment). But native memory (e.g. with MemorySegment) does not give us any such alignment guarantees. So I’m introducing a Predicate version (i.e. deopt when the runtime check fails) and a multiversioning approach (if the check passes enter the fast loop where we assume alignment, else take the slow loop where we have no alignment assumption and may not be able to vectorize as a result). - The Predicate and Multiversioning infrastructure can then be reused for JDK-8324751 C2 SuperWord: Aliasing Analysis runtime check.
- Runtime Check for alignment check when using
- WIP: JDK-8340093 C2 SuperWord: implement cost model
- With the goal to have a more accurate heuristic if we should vectorize reductions (can have additional cost in the loop body).
- Will also allow the vectorization of other shapes: shuffle, pack, unpack etc inside the loop. We need to know if the vectorized loop body is expected to be faster (have overall fewer / cheaper instructions) than the scalar loop.
- If-conversion would also require us to perform such cost-modeling.
- WIP: JDK-8343597 C2 SuperWord: RelaxedMath for faster float reductions
- WIP: Investigate RangeCheck elimination and other issues for vectorization of MemorySegment loops (JDK-8331659 and others).
May 2025
- Integrated: JDK-8355094: Performance drop in auto-vectorized kernel due to split store
- PR is recommended reading: Thorough investigation on impact of aligning loads vs stores, impact of splitting memory ops over cacheline boundary.
- Integrated: JDK-8354477: C2 SuperWord: make use of memory edges more explicit
- Refactoring as preparation for JDK-8324751, see below.
- Integrated: a few follow-ups from JDK-8323582:
- JDK-8354477: C2 SuperWord: make use of memory edges more explicit
- JDK-8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears
- JDK-8352587: C2 SuperWord: we must avoid Multiversioning for PeelMainPost loops
- JDK-8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning
- WIP: JDK-8324751: C2 SuperWord: Aliasing Analysis runtime check
- This has been a big project, using Multiversioning implemented for JDK-8323582. Have to refactor some of VPointer to be able to reconstruct pointers from VPointer, so I can build the runtime checks.
- Suspended for lack of time: JDK-8340093 Cost Modeling, will get back to it after AliasingAnalysis runtime checks.
- Testing:
- Integrated: JDK-8352020 CompileFramework: enable compilation for VectorAPI
- Integrated: JDK-8351952: IR Framework: allow ignoring methods that are not compilable
- Integrated: JDK-8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects
- WIP: JDK-8344942: Template-Based Testing Framework
- This took a lot of time, lots of experiments, rounds of feedback, reviewing. But it is also very exciting, it will save us a lot of time in the future. I already found a list of bugs during prototyping.
Outlook / Priorities
- Using Template Framework
- Aliasing Analysis runtime Checks
- Getting back to Cost Model / Reductions etc.
TO Add Below
- JDK-8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability
- Visualization of the different JDK versions -> see progression
- JDK-8358235: Generators: extend for byte, short, char
- https://bugs.openjdk.org/browse/JDK-8342692
- https://bugs.openjdk.org/browse/JDK-8356176
Graphical Overview
Legend:
- Blue: RFE (light blue: integrated, dark blue: WIP).
- Red: Bug (light red: integrated, dark red: open/WIP).
- Gray: Future Work (RFE).
- Orange: Priority.
Navigate: [+/-] Zoom, [F] toggle Fullscreen