JS: QL-side type/name resolution for TypeScript and JSDoc #19078

asgerf · 2025-03-20T13:49:55Z

Prepares for disabling TypeScript type extraction by recovering the relevant type information on the QL side.

Actually disabling type extraction will happen in a separate PR so it's easier to validate in isolation and rollback if needed.

The code is divided into three main components:

Name resolution: resolve a type name or variable to its definition; looking through imports and (re-)exports.
Type resolution: determine the type of an expression. Here we use a TypeExpr as a placeholder for an actual type which is good enough for our needs. Does not support generics.
Underlying types: Determine the 'underlying types' of a type. The concept of "underlying types" is meant to provide library models a simple way of reasoning about named types without having to deal with unions, intersections, and subtyping relationship. For example (Request & { x: T }) | null has Request as an underlying type.

One of the complexities of TypeScript is the fact that there are three "declaration spaces" in which variables can exist: value, type, and namespace. For example, for a declaration like class C {}, the value C refers to the class itself (i.e. its constructor), whereas the type C describes an instance of the class (not the class itself). In practice, it seems that values and namespaces can be merged without problems so I decided to simplify things by doing that.

Effects on call graph

Disabling type extraction on main would lose us 40k call edges on the default benchmark suite.
The new QL-based implementation recovers 98% of those edges, only 746 edges were not recovered.
On top of that, it discovers 30k new call edges that were missed in the original extractor-based implementation.
- One of reasons for this is better support for JSDoc type annotations in .js files. The new solution has unified support for the two kinds of type annotations, which was not possible previously since we didn't want to run the TypeScript compiler on .js files.
- Another reason is better tolerance for missing dependencies, as something like KnownClass | UnresolvedClass will always propagate information about KnownClass even if UnresolvedClass could not be resolved.

javascript/ql/lib/semmle/javascript/internal/TypeResolution.qll

javascript/ql/lib/semmle/javascript/internal/NameResolution.qll

+    }
+
+    /** Helps track flow from a particular set of source nodes. */
+    module Track<nodeSig/1 isSource> {


javascript/ql/lib/semmle/javascript/internal/TypeResolution.qll

javascript/ql/lib/semmle/javascript/JSDoc.qll

…nt declarations This test enforced the opinion that ambient declarations should have no impact on data flow, which is no longer the case. For now I'm just updating the test output.

Napalys

Extraordinary... 👏

javascript/ql/lib/semmle/javascript/internal/NameResolution.qll

javascript/ql/lib/semmle/javascript/internal/TypeResolution.qll

Co-authored-by: Napalys Klicius <napalys@github.com>

erik-krogh

I still haven't looked closely at the three big commits, but here are some small comments.

javascript/ql/lib/semmle/javascript/internal/TypeResolution.qll

javascript/ql/lib/semmle/javascript/dataflow/Sources.qll

javascript/ql/lib/definitions.qll

erik-krogh · 2025-06-10T12:17:23Z

javascript/ql/lib/semmle/javascript/internal/NameResolution.qll

+    )
+    or
+    exists(ImportTypeExpr imprt |
+      node1 = imprt.getPathExpr() and // TODO: ImportTypeExpr does not seem to be resolved to a Module


Are you planning to leave this TODO?

Not at this time. Since its path string is not an Expr it requires a bit more refactoring.

javascript/ql/lib/semmle/javascript/internal/NameResolution.qll

erik-krogh · 2025-06-10T12:33:40Z

javascript/ql/lib/semmle/javascript/internal/NameResolution.qll

+    predicate step(Node node1, Node node2) {
+      commonStep(node1, node2)
+      or
+      specificStep(node1, node2)
+    }


I feel like you made commonStep and specificStep at a point where the code looked different.

Because both of those predicates are only ever used here, commonStep is not reused anywhere else.

Additionally the specificStep predicate has a QLDoc that mentions it being part of a configuration.
But it's not part of a configuration, it's part of a parameterized module that can be instantiated with a config.

I just thought it would add clarity to separate them commonStep and specificStep. It also helps make clear that commonStep can be materialised once and used twice (once per instantiation of the module).

commonStep is not reused anywhere else.

It is used in two different instantiations of the module. We could inline it in the module, but then again, putting it inside a parameterised module where it does not depend on any of the module parameters is also confusing in its own way.

Additionally the specificStep predicate has a QLDoc that mentions it being part of a configuration.

I thought it would be clear that we're talking about the configuration that was passed to the module?

Ahh, now I see. Yeah, I was wrong.

The code for specificStep is constant, but it depends on predicates defined by the module-parameter.
And you can't move that code to commonStep, because materializing all of that would blow up (because specificStep follows store-read pairs, but commonStep does not).
Is that about right?

Because if it doesn't blow up without the specialization used by specificStep, couldn't you then just compute all possible steps outside this module?

The code for specificStep is constant, but it depends on predicates defined by the module-parameter.

I'm not sure what this means? It depends on the module parameter, both directly via S::isRelevantVariable and indirectly via its dependency on getModuleExports which also depends on the module parameter.

And you can't move that code to commonStep, because materializing all of that would blow up

It's not meant a performance thing at all; it's about separating types from values.

This example illustrates the need to keep them as two different graphs, because both graphs need to step through X without forgetting if a value or type is being tracked:

// lib.ts class C1 {} class C2 {} const X = C1; type X = C2; export { X } // use.ts import { X } from "./lib" var x1 = X // should refer to C1 var x2: X; // should refer to C2

Oh, OK, now I think I get it.
Could you add a comment, e.g. on FlowImpl that the module is specialized to work on only types or only on values, and that is needed to keep the separation you just explained.

commonStep(...) includes steps for both values and types, which might have caused some of my confusion.
Maybe add to the doc for that predicate that it contains type-specific steps, but that those when applied to values.

Added a comment to FlowImpl and expanded on the comment for commonStep

erik-krogh · 2025-06-10T12:48:53Z

javascript/ql/lib/semmle/javascript/internal/TypeResolution.qll

+  Node trackFunctionType(Function fun) {
+    result = fun
+    or
+    exists(Node mid | mid = trackFunctionType(fun) |
+      TypeFlow::step(mid, result)
+      or
+      UnderlyingTypes::underlyingTypeStep(mid, result)
+    )
+  }


It seems you're using a specialized predicate (instead of TypeFlow::TrackNode) because you additionally need the steps from UnderlyingTypes::underlyingTypeStep.

Why can't those steps be part of TypeFlow::TrackNode? Or why can't functions use TypeFlow::TrackNode?

Good question. I've renamed the predicate and added a clarifying qldoc comment in 18f9133

Co-authored-by: Erik Krogh Kristensen <erik-krogh@github.com>

tausbn

Only a few minor comments from me. I think this looks really solid!

tausbn · 2025-06-05T14:07:40Z

javascript/ql/lib/semmle/javascript/internal/NameResolution.qll

+  /**
+   * Holds if values/namespaces/types in `node1` can flow to values/namespaces/types in `node2`.
+   */
+  private predicate commonStep(Node node1, Node node2) {


Minor nit: Could we rename node1 and node2 to, say, source and target (or src and tgt)? It would make it slightly less likely that a typo messed things up.

I'd rather not. In fact I'd say we should standardise on node1,node2 everywhere we define edges in a graph.

This is the convention used in the data flow library, and it generally works out better when you add in things like state1 and state2.

tausbn · 2025-06-05T14:08:24Z

javascript/ql/lib/semmle/javascript/internal/NameResolution.qll

+  private string normalizeModuleName(string name) {
+    result =
+      name.regexpReplaceAll("^node:", "")
+          .regexpReplaceAll("\\.[jt]sx?$", "")


Should this also account for things like .cjs and .mjs?

github-actions bot added the JS label Mar 20, 2025

github-advanced-security bot found potential problems Mar 20, 2025

View reviewed changes

javascript/ql/lib/semmle/javascript/internal/TypeResolution.qll Fixed Show fixed Hide fixed

javascript/ql/lib/semmle/javascript/internal/TypeResolution.qll Fixed Show fixed Hide fixed

asgerf force-pushed the js/name-resolution branch from cee75ae to fa3e5ed Compare March 20, 2025 15:31

github-advanced-security bot found potential problems Mar 20, 2025

View reviewed changes

asgerf force-pushed the js/name-resolution branch 2 times, most recently from fe9b23d to c07cc6e Compare March 27, 2025 20:20

github-advanced-security bot found potential problems Apr 1, 2025

View reviewed changes

javascript/ql/lib/semmle/javascript/JSDoc.qll Fixed Show fixed Hide fixed

asgerf force-pushed the js/name-resolution branch from 2d928f2 to 3b395af Compare April 3, 2025 09:22

asgerf force-pushed the js/name-resolution branch from 3b395af to d92247c Compare April 11, 2025 11:37

asgerf force-pushed the js/name-resolution branch from 87454f7 to f289592 Compare May 2, 2025 11:43

asgerf force-pushed the js/name-resolution branch 4 times, most recently from 45b09df to ae0aeb9 Compare May 19, 2025 10:00

asgerf added 16 commits May 20, 2025 13:19

JS: Exclude externs from CallGraph meta-query

5064cd5

JS: Add ImportSpecifier.getImportDeclaration()

9fc0b8c

JS: Do not ignore variables from ambient declarations

50e4ac8

JS: Make Closure concepts based on AST instead

b5a4fc0

JS: Avoid accidental recursion with API graphs

4cd6f45

JS: Add helper for getting local type names

9566265

JS: Resolve JSDocLocalTypeAccess to a variable in scope

4bfb048

JS: Add test

1051136

JS: Add NameResolution.qll

1533e13

JS: Add UnderlyingTypes.qll

d61f576

JS: Add TypeResolution.qll

fc580a5

JS: Use underlying types in DataFlow::Node

b923eac

JS: Use in TypeAnnotation.getClass and hasUnderlyingType predicates

cca48c0

JS: Update jQuery model

9fd85c9

JS: Use sanitizing primitive types in ViewComponentInput

2d21074

JS: Use sanitizing primitive type in Nest model

6fdd7fe

asgerf added 5 commits May 20, 2025 13:20

JS: Accept change in handling of variable resolution in face of ambie…

b610e10

…nt declarations This test enforced the opinion that ambient declarations should have no impact on data flow, which is no longer the case. For now I'm just updating the test output.

JS: Add regression tests for declared globals

27979c6

JS: Fix regression from global declare vars

9bcc620

JS: Update TRAP after extractor change

11607e5

JS: Add test for missing type flow through generics

b698b4e

asgerf force-pushed the js/name-resolution branch from aab67e7 to 2b208d6 Compare May 20, 2025 13:57

JS: Remove obsolete meta query

d644f80

asgerf force-pushed the js/name-resolution branch from 2b208d6 to d644f80 Compare May 20, 2025 14:21

asgerf added the no-change-note-required This PR does not need a change note label May 22, 2025

asgerf marked this pull request as ready for review May 22, 2025 09:54

asgerf requested a review from a team as a code owner May 22, 2025 09:54

Napalys reviewed Jun 3, 2025

View reviewed changes

javascript/ql/lib/semmle/javascript/internal/NameResolution.qll Show resolved Hide resolved

javascript/ql/lib/semmle/javascript/internal/TypeResolution.qll Show resolved Hide resolved

Update javascript/ql/lib/semmle/javascript/internal/TypeResolution.qll

853ba49

Co-authored-by: Napalys Klicius <napalys@github.com>

erik-krogh reviewed Jun 4, 2025

View reviewed changes

javascript/ql/lib/semmle/javascript/internal/TypeResolution.qll Show resolved Hide resolved

javascript/ql/lib/semmle/javascript/dataflow/Sources.qll Outdated Show resolved Hide resolved

javascript/ql/lib/definitions.qll Show resolved Hide resolved

asgerf added 4 commits June 4, 2025 22:17

JS: Add test with type casts

79101fd

JS: Add SatisfiesExpr

57fad7e

JS: Nicer jump-to-def for function declarations

691fdb1

JS: Update test output now that 'satisfies' is a SourceNode

42f762a

erik-krogh reviewed Jun 10, 2025

View reviewed changes

asgerf and others added 2 commits June 10, 2025 16:06

Update javascript/ql/lib/semmle/javascript/internal/NameResolution.qll

a6488cb

Co-authored-by: Erik Krogh Kristensen <erik-krogh@github.com>

JS: Rename and clarify comment for trackFunctionType

18f9133

tausbn previously approved these changes Jun 10, 2025

View reviewed changes

JS: Normalize a few more extensions

72cc439

asgerf dismissed tausbn’s stale review via 72cc439 June 10, 2025 15:37

erik-krogh previously approved these changes Jun 10, 2025

View reviewed changes

JS: Add comment and examples in FlowImpl doc

2aa5fa1

asgerf dismissed erik-krogh’s stale review via 2aa5fa1 June 11, 2025 08:21

JS: Clarifying comment on commonStep

e848aa7

erik-krogh approved these changes Jun 11, 2025

View reviewed changes

asgerf merged commit 423ffc7 into github:main Jun 11, 2025
18 checks passed

JS: QL-side type/name resolution for TypeScript and JSDoc #19078

JS: QL-side type/name resolution for TypeScript and JSDoc #19078

Uh oh!

Conversation

asgerf commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Effects on call graph

Uh oh!

Uh oh!

Uh oh!

Check warning

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Napalys left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

erik-krogh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asgerf Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tausbn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

asgerf commented Mar 20, 2025 •

edited

Loading

asgerf Jun 11, 2025 •

edited

Loading