Skip to content

C++: Expose SSA definitions from dataflow #20149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 1, 2025

Conversation

MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Jul 31, 2025

This PR makes it possible to use the dataflow SSA library outside the internal dataflow directory.

This has been a request by people for many years, but I've pushed back on it since we didn't feel the API was stable enough to publicly expose it. However, the API basically hasn't changed since we switched to the "new" dataflow library more than two years ago. So I think it's pretty safe to say that it's stable enough at this point.

I don't think we need a DCA run for this since:

  • No queries actually use these classes at this point, and
  • No functionality is changing. We are just exposing already existing classes which we've been using for years.

cc @bdrodes since you've been quite noisy about wanting this.

I did a small drive-by fix while here in commit 1: Schack left a TODO for us in #18942 which I've fixed. This makes the predicate identical to Java and C#.

@Copilot Copilot AI review requested due to automatic review settings July 31, 2025 12:49
@MathiasVP MathiasVP requested a review from a team as a code owner July 31, 2025 12:49
@github-actions github-actions bot added the C++ label Jul 31, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR exposes the SSA definitions from the internal dataflow library to make them publicly available for external use. The SSA API has been stable for over two years, making it safe to expose these classes to users who have been requesting this functionality.

  • Adds a getAUse() method to the Definition class to retrieve nodes that use a definition
  • Introduces new SSA definition classes: ExplicitDefinition, DirectExplicitDefinition, and IndirectExplicitDefinition
  • Exposes the SSA classes through type aliases in DataFlowUtil.qll

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
SsaInternals.qll Adds new SSA definition classes and enhances the Definition class with use tracking
DataFlowUtil.qll Creates public type aliases to expose the internal SSA classes

MathiasVP and others added 2 commits July 31, 2025 13:52
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@aschackmull
Copy link
Contributor

I don't think any other language exposes SSA in terms of data flow nodes. I can see the motivation, though, it's a much shorter path to impact than, say, getting a good SSA on the AST nodes. But it's important to highlight that this is not really pointing in the direction of language api unification (the glorious future).
I don't like the name DataFlow::Definition - I think Ssa should be in there somewhere in the name: Either DataFlow::Ssa::Definition or DataFlow::SsaDefinition (same of course goes for the other class names).

@MathiasVP
Copy link
Contributor Author

I'm happy to rename it to SsaDefinition. As far as I can tell C# calls them Definitions and Java calls them SsaVariables. So using Definition was my way of actually trying to be consistent with at least one other language 😅

Using DataFlow::Nodes for the API makes it a lot easier to work with in C/C++ since the alternative would be to define them in terms of (Expr, int) pairs (or (Instruction/Operand, int) pairs) to allow correctly handling indirections. By using DataFlow::Nodes which are already defined basically as (Instruction/Operand, int) pairs we get a much cleaner API from the C/C++ point of view, at least.

But I can see the value in trying to align with other languages on this. I'll experiment a bit and see if I can find a balance that I like 🤔

@MathiasVP
Copy link
Contributor Author

Bahh! We can't even name it SsaDefinition because of this stupid old library that we only use in SimpleRangeAnalysis 🤯 So I'll do DataFlow::Ssa::Definition instead, and then we can always move it (with a long deprecation cycle) once we get rid of SimpleRangeAnalysis

@aschackmull
Copy link
Contributor

But I can see the value in trying to align with other languages on this. I'll experiment a bit and see if I can find a balance that I like

Ultimately, I think a canonical SSA for C++ could be based on a new class that encapsulates the (Expr, int) pair. Ideally this would provide the same benefits as the data flow SSA, but without a strong tie to the data flow nodes (because there may well be use-cases for SSA on which we'd like data flow nodes to depend - or at least leave us that option). In general I think it makes sense to have a SourceVariable class (that's what it's called in the SSA input sig) that covers more than just surface-level local variables. Java (and C#) e.g. includes (qualified) fields, such that x.y.z gets SSA with suitable updates when x or x.y changes. Putting pointer variables plus their indirections into that setting makes sense, I think. In general, for a given language, I think these SSA source variables should cover whatever surface level constructs that are likely to repeatedly occur in a function, and where programmers expect the value to be unchanged in most cases for repeated occurrences - we should perhaps draw a little inspiration from whatever experience we have from value numbering (as that's in some ways the ad-hoc approximation). For instance, C# likely wants to include property accesses.

@aschackmull
Copy link
Contributor

As far as I can tell C# calls them Definitions

C# generally refers to them as Ssa::Definition. Unqualified use of the Definition name is restricted to within the SSA libraries.

@MathiasVP
Copy link
Contributor Author

C# generally refers to them as Ssa::Definition. Unqualified use of the Definition name is restricted to within the SSA libraries.

Ah, right. I'll follow that direction, then.

Ultimately, I think a canonical SSA for C++ could be based on a new class that encapsulates the (Expr, int) pair. Ideally this would provide the same benefits as the data flow SSA, but without a strong tie to the data flow nodes (because there may well be use-cases for SSA on which we'd like data flow nodes to depend - or at least leave us that option). In general I think it makes sense to have a SourceVariable class (that's what it's called in the SSA input sig) that covers more than just surface-level local variables. Java (and C#) e.g. includes (qualified) fields, such that x.y.z gets SSA with suitable updates when x or x.y changes. Putting pointer variables plus their indirections into that setting makes sense, I think. In general, for a given language, I think these SSA source variables should cover whatever surface level constructs that are likely to repeatedly occur in a function, and where programmers expect the value to be unchanged in most cases for repeated occurrences - we should perhaps draw a little inspiration from whatever experience we have from value numbering (as that's in some ways the ad-hoc approximation). For instance, C# likely wants to include property accesses.

Indeed, that makes a lot of sense! I'll keep the API minimal for now and leave open the option of providing an (Expr, int) pair in the future.

@MathiasVP
Copy link
Contributor Author

MathiasVP commented Aug 1, 2025

g> I don't think any other language exposes SSA in terms of data flow nodes. I can see the motivation, though, it's a much shorter path to impact than, say, getting a good SSA on the AST nodes. But it's important to highlight that this is not really pointing in the direction of language api unification (the glorious future). I don't like the name DataFlow::Definition - I think Ssa should be in there somewhere in the name: Either DataFlow::Ssa::Definition or DataFlow::SsaDefinition (same of course goes for the other class names).

I've renamed the classes in 33d0598, and modified the API to not expose dataflow nodes (but rather Instructions and Operands) in b70836e.

I do agree that we ultimately would like a new abstraction around (Expr, int) pairs, but I'll leave that for the future.

I did a drive-by cleanups in 7ede3aa and 32e6d09. Nothing exciting is happening there (it's just renaming a few files and changing the name of some private import aliases).

@MathiasVP MathiasVP force-pushed the expose-definition-from-dataflow-ssa branch from 889925b to b70836e Compare August 1, 2025 10:34
Copy link
Contributor

@aschackmull aschackmull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MathiasVP MathiasVP merged commit 1fab97b into github:main Aug 1, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy