Skip to content

JS: QL-side type/name resolution for TypeScript and JSDoc #19078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 47 commits into from
Jun 11, 2025

Conversation

asgerf
Copy link
Contributor

@asgerf asgerf commented Mar 20, 2025

Prepares for disabling TypeScript type extraction by recovering the relevant type information on the QL side.

Actually disabling type extraction will happen in a separate PR so it's easier to validate in isolation and rollback if needed.

The code is divided into three main components:

  • Name resolution: resolve a type name or variable to its definition; looking through imports and (re-)exports.
  • Type resolution: determine the type of an expression. Here we use a TypeExpr as a placeholder for an actual type which is good enough for our needs. Does not support generics.
  • Underlying types: Determine the 'underlying types' of a type. The concept of "underlying types" is meant to provide library models a simple way of reasoning about named types without having to deal with unions, intersections, and subtyping relationship. For example (Request & { x: T }) | null has Request as an underlying type.

One of the complexities of TypeScript is the fact that there are three "declaration spaces" in which variables can exist: value, type, and namespace. For example, for a declaration like class C {}, the value C refers to the class itself (i.e. its constructor), whereas the type C describes an instance of the class (not the class itself). In practice, it seems that values and namespaces can be merged without problems so I decided to simplify things by doing that.

Effects on call graph

  • Disabling type extraction on main would lose us 40k call edges on the default benchmark suite.
  • The new QL-based implementation recovers 98% of those edges, only 746 edges were not recovered.
  • On top of that, it discovers 30k new call edges that were missed in the original extractor-based implementation.
    • One of reasons for this is better support for JSDoc type annotations in .js files. The new solution has unified support for the two kinds of type annotations, which was not possible previously since we didn't want to run the TypeScript compiler on .js files.
    • Another reason is better tolerance for missing dependencies, as something like KnownClass | UnresolvedClass will always propagate information about KnownClass even if UnresolvedClass could not be resolved.

@github-actions github-actions bot added the JS label Mar 20, 2025
@asgerf asgerf force-pushed the js/name-resolution branch from cee75ae to fa3e5ed Compare March 20, 2025 15:31
}

/** Helps track flow from a particular set of source nodes. */
module Track<nodeSig/1 isSource> {

Check warning

Code scanning / CodeQL

Dead code Warning

This code is never used, and it's not publicly exported.
@asgerf asgerf force-pushed the js/name-resolution branch 2 times, most recently from fe9b23d to c07cc6e Compare March 27, 2025 20:20
@asgerf asgerf force-pushed the js/name-resolution branch from 2d928f2 to 3b395af Compare April 3, 2025 09:22
@asgerf asgerf force-pushed the js/name-resolution branch from 3b395af to d92247c Compare April 11, 2025 11:37
@asgerf asgerf force-pushed the js/name-resolution branch from 87454f7 to f289592 Compare May 2, 2025 11:43
@asgerf asgerf force-pushed the js/name-resolution branch 4 times, most recently from 45b09df to ae0aeb9 Compare May 19, 2025 10:00
asgerf added 5 commits May 20, 2025 13:20
…nt declarations

This test enforced the opinion that ambient declarations should have no impact on data flow, which is no longer the case. For now I'm just updating the test output.
@asgerf asgerf force-pushed the js/name-resolution branch from aab67e7 to 2b208d6 Compare May 20, 2025 13:57
@asgerf asgerf force-pushed the js/name-resolution branch from 2b208d6 to d644f80 Compare May 20, 2025 14:21
@asgerf asgerf added the no-change-note-required This PR does not need a change note label May 22, 2025
@asgerf asgerf marked this pull request as ready for review May 22, 2025 09:54
@asgerf asgerf requested a review from a team as a code owner May 22, 2025 09:54
Copy link
Contributor

@Napalys Napalys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extraordinary... 👏

Co-authored-by: Napalys Klicius <napalys@github.com>
Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still haven't looked closely at the three big commits, but here are some small comments.

)
or
exists(ImportTypeExpr imprt |
node1 = imprt.getPathExpr() and // TODO: ImportTypeExpr does not seem to be resolved to a Module
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to leave this TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at this time. Since its path string is not an Expr it requires a bit more refactoring.

Comment on lines +348 to +352
predicate step(Node node1, Node node2) {
commonStep(node1, node2)
or
specificStep(node1, node2)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like you made commonStep and specificStep at a point where the code looked different.

Because both of those predicates are only ever used here, commonStep is not reused anywhere else.

Additionally the specificStep predicate has a QLDoc that mentions it being part of a configuration.
But it's not part of a configuration, it's part of a parameterized module that can be instantiated with a config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought it would add clarity to separate them commonStep and specificStep. It also helps make clear that commonStep can be materialised once and used twice (once per instantiation of the module).

commonStep is not reused anywhere else.

It is used in two different instantiations of the module. We could inline it in the module, but then again, putting it inside a parameterised module where it does not depend on any of the module parameters is also confusing in its own way.

Additionally the specificStep predicate has a QLDoc that mentions it being part of a configuration.

I thought it would be clear that we're talking about the configuration that was passed to the module?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, now I see. Yeah, I was wrong.

The code for specificStep is constant, but it depends on predicates defined by the module-parameter.
And you can't move that code to commonStep, because materializing all of that would blow up (because specificStep follows store-read pairs, but commonStep does not).
Is that about right?

Because if it doesn't blow up without the specialization used by specificStep, couldn't you then just compute all possible steps outside this module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code for specificStep is constant, but it depends on predicates defined by the module-parameter.

I'm not sure what this means? It depends on the module parameter, both directly via S::isRelevantVariable and indirectly via its dependency on getModuleExports which also depends on the module parameter.

And you can't move that code to commonStep, because materializing all of that would blow up

It's not meant a performance thing at all; it's about separating types from values.

This example illustrates the need to keep them as two different graphs, because both graphs need to step through X without forgetting if a value or type is being tracked:

// lib.ts
class C1 {}
class C2 {}

const X = C1;
type X = C2;

export { X }

// use.ts
import { X } from "./lib"

var x1 = X // should refer to C1
var x2: X; // should refer to C2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, OK, now I think I get it.
Could you add a comment, e.g. on FlowImpl that the module is specialized to work on only types or only on values, and that is needed to keep the separation you just explained.

commonStep(...) includes steps for both values and types, which might have caused some of my confusion.
Maybe add to the doc for that predicate that it contains type-specific steps, but that those when applied to values.

Copy link
Contributor Author

@asgerf asgerf Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to FlowImpl and expanded on the comment for commonStep

Comment on lines 11 to 19
Node trackFunctionType(Function fun) {
result = fun
or
exists(Node mid | mid = trackFunctionType(fun) |
TypeFlow::step(mid, result)
or
UnderlyingTypes::underlyingTypeStep(mid, result)
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you're using a specialized predicate (instead of TypeFlow::TrackNode) because you additionally need the steps from UnderlyingTypes::underlyingTypeStep.

Why can't those steps be part of TypeFlow::TrackNode? Or why can't functions use TypeFlow::TrackNode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I've renamed the predicate and added a clarifying qldoc comment in 18f9133

tausbn
tausbn previously approved these changes Jun 10, 2025
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a few minor comments from me. I think this looks really solid!

/**
* Holds if values/namespaces/types in `node1` can flow to values/namespaces/types in `node2`.
*/
private predicate commonStep(Node node1, Node node2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: Could we rename node1 and node2 to, say, source and target (or src and tgt)? It would make it slightly less likely that a typo messed things up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not. In fact I'd say we should standardise on node1,node2 everywhere we define edges in a graph.

This is the convention used in the data flow library, and it generally works out better when you add in things like state1 and state2.

private string normalizeModuleName(string name) {
result =
name.regexpReplaceAll("^node:", "")
.regexpReplaceAll("\\.[jt]sx?$", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also account for things like .cjs and .mjs?

erik-krogh
erik-krogh previously approved these changes Jun 10, 2025
@asgerf asgerf merged commit 423ffc7 into github:main Jun 11, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
JS no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy