Skip to content

Java: Improve a couple of join-orders #20127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

aschackmull
Copy link
Contributor

3 commits fixing 3 join-orders:

  1. The closeCalled code was quite tangled, which caused the optimiser to select NOT #prev before #prev_delta. This is always bad, so the optimiser only does this when it's pushed sufficiently into a corner - usually due to excessive nesting. A little bit of untangling helped.
  2. The code in getErasedRepr inherently contains cartesian products, so the optimiser can't avoid them, which messes with its estimates. A slight refactor ensures that the CPs are with predicates of size 1 as they should be.
  3. The fastTC was being nicely transformed to doublyBoundedFastTC by the optimiser, however the source(t, n) join represents a step with a huge fanout, so by including that in the graph the resulting bounded TC becomes orders of magnitude smaller.

Tuple counts:
1.
Before

Pipeline standard for CloseType::closeCalled/1#f50b3d7b@41cf68nf was evaluated in 7 iterations totaling 185ms (delta sizes total: 1708).
                         {2} r1 = `_#Variable::LocalVariableDecl#9c787ed3Merge_#Variable::Parameter#f557a41dMerge_#Variable::Variable.g__#loop_invariant_prefix` AND NOT `CloseType::closeCalled/1#f50b3d7b#prev`(FIRST 1)
        6348863   ~3%    {2}    | SCAN OUTPUT In.1, In.0
        1791494   ~1%    {3}    | JOIN WITH `#Expr::MethodCall.getArgument/1#dispred#06f9ed20Merge_201#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Rhs.2
        1791494  ~20%    {3}    | JOIN WITH `Expr::MethodCall.getMethod/0#dispred#41989dc9` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2
        3270023   ~0%    {3}    | JOIN WITH `#Variable::Parameter.getCallable/0#dispred#d0614045Merge_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2
                     
           2584   ~0%    {3} r2 = JOIN r1 WITH `CloseType::closeCalled/1#f50b3d7b#prev_delta` ON FIRST 1 OUTPUT Lhs.0, Lhs.2, Lhs.1
           1708   ~0%    {3}    | JOIN WITH `Variable::Parameter.getPosition/0#dispred#437804ac` ON FIRST 2 OUTPUT Lhs.0, Lhs.1, Lhs.2
                     
          79316   ~6%    {3} r3 = JOIN r1 WITH isVarargsParam ON FIRST 1 OUTPUT Lhs.0, Lhs.1, Lhs.2
          79182   ~0%    {4}    | JOIN WITH `Variable::Parameter.getPosition/0#dispred#437804ac` ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.0, Rhs.1
                         {4}    | REWRITE WITH TEST InOut.3 <= InOut.1
          49989   ~6%    {3}    | SCAN OUTPUT In.2, In.0, In.3
          31735   ~0%    {4}    | JOIN WITH `Variable::Variable.getAnAccess/0#dispred#6504c76d` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.0, Lhs.2
           2247   ~0%    {4}    | JOIN WITH `#Statement::EnhancedForStmt.getExpr/0#dispred#5b0debb1Merge_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3
           2247   ~0%    {4}    | JOIN WITH `Statement::EnhancedForStmt.getVariable/0#dispred#29ffc87e` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3
           2247   ~0%    {4}    | JOIN WITH `Expr::LocalVariableDeclExpr.getVariable/0#dispred#15a2dda3` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3
              0   ~0%    {3}    | JOIN WITH `CloseType::closeCalled/1#f50b3d7b#prev_delta` ON FIRST 1 OUTPUT Lhs.2, Lhs.3, Lhs.1
                     
           1708   ~0%    {3} r4 = r2 UNION r3
           1708   ~0%    {1}    | JOIN WITH `Variable::Parameter.getPosition/0#dispred#437804ac` ON FIRST 2 OUTPUT Lhs.2
                         return r4

After:

Pipeline standard for CloseType::closeCalled/1#f50b3d7b@512617qc was evaluated in 7 iterations totaling 0ms (delta sizes total: 1697).
          708   ~0%    {2} r1 = JOIN `CloseType::closeCalled/1#f50b3d7b#prev_delta` WITH `Variable::Parameter.getCallable/0#dispred#d0614045` ON FIRST 1 OUTPUT Rhs.1, Lhs.0
          698   ~0%    {1}    | JOIN WITH Member::Method#b088329b ON FIRST 1 OUTPUT Lhs.1
         5596   ~0%    {1}    | JOIN WITH `Variable::Parameter.getAnArgument/0#dispred#ec2f3f49` ON FIRST 1 OUTPUT Rhs.1
         3300  ~25%    {1}    | JOIN WITH `#Variable::Variable.getAnAccess/0#dispred#6504c76dMerge_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1
         2439   ~1%    {1}    | JOIN WITH @localscopevariable ON FIRST 1 OUTPUT Lhs.0
                   
        10139   ~0%    {1} r2 = JOIN `CloseType::closeCalled/1#f50b3d7b#prev_delta` WITH `#Expr::LocalVariableDeclExpr.getVariable/0#dispred#15a2dda3Merge_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1
           54   ~1%    {1}    | JOIN WITH `#Statement::EnhancedForStmt.getVariable/0#dispred#29ffc87eMerge_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1
           54   ~1%    {1}    | JOIN WITH `Statement::EnhancedForStmt.getExpr/0#dispred#5b0debb1` ON FIRST 1 OUTPUT Rhs.1
           41   ~9%    {1}    | JOIN WITH `#Variable::Variable.getAnAccess/0#dispred#6504c76dMerge_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1
            2   ~0%    {1}    | JOIN WITH isVarargsParam ON FIRST 1 OUTPUT Lhs.0
            2   ~0%    {2}    | JOIN WITH `Variable::Parameter.getPosition/0#dispred#437804ac` ON FIRST 1 OUTPUT Lhs.0, Rhs.1
            2   ~0%    {2}    | JOIN WITH `Variable::Parameter.getCallable/0#dispred#d0614045` ON FIRST 1 OUTPUT Rhs.1, Lhs.1
            2   ~0%    {2}    | JOIN WITH `#Member::Method.getSourceDeclaration/0#dispred#93e6cdf8Merge_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1
           20   ~4%    {2}    | JOIN WITH `#Expr::MethodCall.getMethod/0#dispred#41989dc9Merge_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1
           48   ~0%    {3}    | JOIN WITH `Expr::MethodCall.getArgument/1#dispred#06f9ed20` ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Rhs.2
                       {3}    | REWRITE WITH TEST InOut.0 <= InOut.1
           45   ~2%    {1}    | SCAN OUTPUT In.2
           41   ~9%    {1}    | JOIN WITH `#Variable::Variable.getAnAccess/0#dispred#6504c76dMerge_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1
           23   ~0%    {1}    | JOIN WITH @localscopevariable ON FIRST 1 OUTPUT Lhs.0
                   
         2462   ~1%    {1} r3 = r1 UNION r2
         1697   ~0%    {1}    | AND NOT `CloseType::closeCalled/1#f50b3d7b#prev`(FIRST 1)
                       return r3

Before:

[2025-07-24 14:41:41] Evaluated non-recursive predicate DataFlowPrivate::getErasedRepr/1#dd2a956f#fb@ed3fd13r in 772ms (size: 3919533).
Evaluated relational algebra for predicate DataFlowPrivate::getErasedRepr/1#dd2a956f#fb@ed3fd13r with tuple counts:
               1   ~0%    {1} r1 = JOIN JDK::TypeObject#5026c17b WITH @reftype ON FIRST 1 OUTPUT Lhs.0
               1   ~0%    {2}    | JOIN WITH Type::NullType#ceb001ce CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0
                      
         3919524   ~0%    {2} r2 = JOIN `Type::erase/1#afa87d84_10#join_rhs` WITH @reftype ON FIRST 1 OUTPUT Lhs.0, Lhs.1
         3919510   ~0%    {2}    | AND NOT Type::NumericOrCharType#edfeee0d(FIRST 1)
                          {2}    | AND NOT Type::BooleanType#79717bc5(FIRST 1)
         3919508   ~0%    {2}    | SCAN OUTPUT In.1, In.0
                      
              21   ~0%    {1} r3 = JOIN `Type::erase/1#afa87d84_10#join_rhs` WITH Type::NumericOrCharType#edfeee0d ON FIRST 1 OUTPUT Lhs.1
        82310109   ~0%    {2}    | JOIN WITH @reftype CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0
             168   ~4%    {4}    | JOIN WITH `Type::BoxedType.getPrimitiveType/0#dispred#f15a496a` ON FIRST 1 OUTPUT Rhs.1, _, Lhs.1, Lhs.0
             168   ~0%    {4}    | REWRITE WITH Out.1 := "double"
              21   ~0%    {2}    | JOIN WITH `Element::Element.getName/0#dispred#fb2dc94a` ON FIRST 2 OUTPUT Lhs.2, Lhs.3
                      
               3   ~0%    {2} r4 = JOIN `Type::erase/1#afa87d84_10#join_rhs` WITH Type::BooleanType#79717bc5 ON FIRST 1 OUTPUT Lhs.0, Lhs.1
                          {2}    | AND NOT Type::NumericOrCharType#edfeee0d(FIRST 1)
               3   ~0%    {1}    | SCAN OUTPUT In.1
        11758587   ~0%    {2}    | JOIN WITH @reftype CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0
              24   ~0%    {4}    | JOIN WITH `Type::BoxedType.getPrimitiveType/0#dispred#f15a496a` ON FIRST 1 OUTPUT Rhs.1, _, Lhs.1, Lhs.0
              24   ~0%    {4}    | REWRITE WITH Out.1 := "boolean"
               3   ~0%    {2}    | JOIN WITH `Element::Element.getName/0#dispred#fb2dc94a` ON FIRST 2 OUTPUT Lhs.2, Lhs.3
                      
         3919533   ~0%    {2} r5 = r1 UNION r2 UNION r3 UNION r4
                          return r5

After:

[2025-07-24 15:05:06] Evaluated non-recursive predicate DataFlowPrivate::getErasedRepr/1#dd2a956f#fb@57d5cfng in 124ms (size: 3919533).
Evaluated relational algebra for predicate DataFlowPrivate::getErasedRepr/1#dd2a956f#fb@57d5cfng with tuple counts:
              1   ~0%    {1} r1 = JOIN JDK::TypeObject#5026c17b WITH @reftype ON FIRST 1 OUTPUT Lhs.0
              1   ~0%    {2}    | JOIN WITH Type::NullType#ceb001ce CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0
                     
             21   ~0%    {1} r2 = JOIN `Type::erase/1#afa87d84_10#join_rhs` WITH Type::NumericOrCharType#edfeee0d ON FIRST 1 OUTPUT Lhs.1
             21   ~0%    {2}    | JOIN WITH `DataFlowPrivate::numericRepresentative/1#4cfa88e7` CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0
             21   ~0%    {2}    | JOIN WITH @reftype ON FIRST 1 OUTPUT Lhs.1, Lhs.0
                     
        3919524   ~0%    {2} r3 = JOIN `Type::erase/1#afa87d84_10#join_rhs` WITH @reftype ON FIRST 1 OUTPUT Lhs.0, Lhs.1
        3919510   ~0%    {2}    | AND NOT Type::NumericOrCharType#edfeee0d(FIRST 1)
                         {2}    | AND NOT Type::BooleanType#79717bc5(FIRST 1)
        3919508   ~0%    {2}    | SCAN OUTPUT In.1, In.0
                     
              3   ~0%    {2} r4 = JOIN `Type::erase/1#afa87d84_10#join_rhs` WITH Type::BooleanType#79717bc5 ON FIRST 1 OUTPUT Lhs.0, Lhs.1
                         {2}    | AND NOT Type::NumericOrCharType#edfeee0d(FIRST 1)
              3   ~0%    {1}    | SCAN OUTPUT In.1
              3   ~0%    {2}    | JOIN WITH `DataFlowPrivate::booleanRepresentative/1#5d5d49f6` CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0
              3   ~0%    {2}    | JOIN WITH @reftype ON FIRST 1 OUTPUT Lhs.1, Lhs.0
                     
        3919533   ~0%    {2} r5 = r1 UNION r2 UNION r3 UNION r4
                         return r5

Before

[2025-07-25 14:05:49] Evaluated non-recursive predicate ObjFlow::objType/2#dc0bcb48@bb871090 in 205ms (size: 801187).
Evaluated relational algebra for predicate ObjFlow::objType/2#dc0bcb48@bb871090 with tuple counts:
             891     ~0%    {2} r1 = SCAN `ObjFlow::sink/1#fb63de68` OUTPUT In.0, In.0
                        
        50259694     ~0%    {2} r2 = SCAN `fastTC@ObjFlow::objStepPruned/2#4f27d039#2#bounded#swapped` OUTPUT In.1, In.0
                        
        50260585     ~0%    {2} r3 = r1 UNION r2
        53496733  ~6712%    {2}    | JOIN WITH `ObjFlow::source/2#8042545e_10#join_rhs` ON FIRST 1 OUTPUT Lhs.1, Rhs.1
                            return r3

After:

[2025-07-25 14:28:55] Evaluated non-recursive predicate ObjFlow::objectToStringQualType/2#0f6753f3@249d5an8 in 5ms (size: 738214).
Evaluated relational algebra for predicate ObjFlow::objectToStringQualType/2#0f6753f3@249d5an8 with tuple counts:
           381  ~0%    {2} r1 = JOIN `ObjFlow::objectToString/1#142998fd` WITH num#ObjFlow::TObjNode#89fac439 ON FIRST 1 OUTPUT Rhs.1, Lhs.1
        738214  ~0%    {2}    | JOIN WITH `doublyBoundedFastTC:ObjFlow::objStepPruned/2#4f27d039_10#higher_order_body:_ObjFlow::flowSink/1#eb4848fa_ObjFlow::objectToString/1#142998fd_num#ObjFlow::TObjNode#89fac439#higher_order_body:ObjFlow::flowSrc/1#b7235d33` ON FIRST 1 OUTPUT Rhs.1, Lhs.1
        738214  ~0%    {2}    | JOIN WITH num#ObjFlow::TObjType#7bc00a19_10#join_rhs ON FIRST 1 OUTPUT Lhs.1, Rhs.1
                       return r1

@aschackmull aschackmull added the no-change-note-required This PR does not need a change note label Jul 25, 2025
@Copilot Copilot AI review requested due to automatic review settings July 25, 2025 12:53
@aschackmull aschackmull requested a review from a team as a code owner July 25, 2025 12:53
@github-actions github-actions bot added the Java label Jul 25, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves performance of CodeQL query evaluation by optimizing join orders in three specific predicates through refactoring to better guide the query optimizer's decision-making.

  • Restructures closeCalled predicate to prevent inefficient join ordering
  • Refactors getErasedRepr to minimize cartesian product impact on optimizer estimates
  • Modifies objType predicate to use bounded transitive closure for better performance

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
CloseType.qll Restructures closeCalled predicate logic to improve join order optimization
ObjFlow.qll Introduces new type system and bounded transitive closure to optimize objType predicate
DataFlowPrivate.qll Extracts helper predicates to reduce cartesian products in getErasedRepr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy