Skip to content

fix(gazelle) Delete python targets with invalid srcs #3046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

yushan26
Copy link
Contributor

@yushan26 yushan26 commented Jul 1, 2025

When running Gazelle, it generated the following target:

py_binary(
    name = "remove_py_binary",
    srcs = ["__main__.py"],
    main = "__main__.py",
    visibility = ["//visibility:public"],
)

After __main__.py was deleted and the change committed, re-running Gazelle did not remove the file from the srcs list.
This change introduces logic to check whether all entries in a Python target’s srcs attribute correspond to valid files. If none of them exist, the target is added to result.Empty to signal that it should be cleaned up. This cleanup behavior applies to when python_generation mode is package or file, as all srcs are expected to reside directly within the current directory.

@yushan26 yushan26 force-pushed the cleanup-python-targets branch from f0f7a1b to c573eb3 Compare July 1, 2025 22:11
@dougthor42
Copy link
Collaborator

IMO this should be handled by properly supporting fix.

Specifically (emphasis mine):

update performs a safe set of tranformations [sic], while fix performs some additional transformations that may delete or rename rules.

We should not be deleting rules when running in update mode.

I know it's a PITA - my users complain about it too - but there are too many legitimate cases where a py_library rule exists with empty srcs value or the label in srcs doesn't point to anything (for example, maybe that src is created from a genrule? Or maybe it's created during CI outside of Bazel?)

@jayconrod
Copy link

@linzhp tagged me on this.

update should delete rules with empty srcs, if that makes sense for the language. That's intended. For Go, such a library cannot be built. If all sources are gone (deleted by the user?), then it makes sense for update to remove the rules.

I don't know Python well enough to say whether that same behavior makes sense here. What does a py_library do if srcs are empty?

The purpose of fix was to help migrate through incompatible changes in the ruleset or in Bazel. For example, Go used to have a cgo_library macro that was used on cgo code, which needed to be separate from the go_library. Eventually, that was deprecated and removed, so fix helped by squashing the cgo_library and go_library targets together. That had potential to break anything not managed by Gazelle that depended on cgo_library targets directly, so the distinction was made between fix and update.

These days, rule sets are a lot more stable, so fix doesn't need to do very much.

@dougthor42
Copy link
Collaborator

Ah, well I stand corrected then! Thanks for the background and info - I misunderstood the difference between update and fix (if I remember to, I'll update the bazel-gazelle docs).

What does a py_library do if srcs are empty?

It's valid. For example, we use it to handle circular dependencies:

# ########## START Autogenerated cycle targets and directives ##########
# See go/qos-doc-2024-24 for more info.
# cycle_c63218a5 (2 targets):
# gazelle:resolve py labrad.client //src:cycle_c63218a5
# gazelle:resolve py labrad.proto_over_labrad //src:cycle_c63218a5
py_library(
    name = "cycle_c63218a5",
    srcs = [],
    imports = ["."],
    tags = ["cycle"],
    visibility = ["//visibility:public"],
    deps = [
        "//src/labrad:client",
        "//src/labrad:proto_over_labrad",
    ],
)

In the above case, :client depends on :proto_over_labrad which depends on :client.

Why do we put the cycles in deps and not in srcs? With srcs, all our internal dependencies are pulled in, but anything from @pypi// doesn't get pulled in to the target's dependencies. With deps, everything gets pulled in.

So Gazelle deleting py_library with empty srcs would break things for us. It might be easy enough to # gazelle:keep them though.

@linzhp
Copy link
Contributor

linzhp commented Jul 2, 2025

Currently, Gazelle is already cleaning up py_library when the srcs is empty in some cases:

// If we're doing per-file generation, srcs could be empty at this point, meaning we shouldn't make a py_library.
// If there is already a package named py_library target before, we should generate an empty py_library.
if srcs.Empty() {

The situation we are trying to fix is the per-file py_binary targets (no entry point): when the source file is removed, we would like to py_binary to be removed too, or the build would fail.

There are two ways to move forward with this PR:

  1. keep the current approach and ask people to # keep those py_library targets with empty sources, like you suggested.
  2. scope it down to py_binary only for now (maybe think about py_test in the future)

Let us know which approach you prefer.

@linzhp
Copy link
Contributor

linzhp commented Jul 13, 2025

@dougthor42 friendly ping on this ☝️

Copy link
Collaborator

@dougthor42 dougthor42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes nontrivial issues with our QuanutmAI repositories. Sorry, I'll have to block this until I can figure out a solution.

Can you try down-scoping to just py_binary as mentioned in #3046 (comment)? That might to a long way.

Copy link
Collaborator

@dougthor42 dougthor42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I ran the "binary only" code on our codebase and no unexpected changes were made, yay!

Please update CHANGELOG.md with a short description of the PR.

@@ -32,6 +32,7 @@ import (
"github.com/emirpasic/gods/sets/treeset"
godsutils "github.com/emirpasic/gods/utils"


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove

@@ -485,6 +488,44 @@ func (py *Python) GenerateRules(args language.GenerateArgs) language.GenerateRes
return result
}

// getRulesWithInvalidSrcs checks existing Python rules in the BUILD file and return the rules with invalid srcs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please describe what "invalid srcs" means in the comment.

# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please remove copyright header - they're no longer needed.

@@ -0,0 +1,5 @@
py_binary(
name = "keep_target_binary",
srcs = ["//test/binary:__main__.py"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please refer to a binary found in the local dir (eg make keep_binary/foo.py with and if __name__ == "__main__": block in it)

@@ -0,0 +1 @@
workspace(name = "remove_invalid_binary")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional nit: this file can (should?) be empty.

if existingRule.Kind() != pyBinaryKind {
continue
}
allInvalidSrcs := true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid double negative in the code. So instead of "allInvalidSrcs == false", it's easier to understand if we say "hasValidSrcs == true"

Comment on lines 497 to 499
for _, file := range args.RegularFiles {
filesMap[file] = struct{}{}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have caught this earlier, but I don't think we should read args.RegularFiles here. There is a lot of filtering logic at the beginning of GenerateRules that should also be applied here:

if cfg.IgnoresFile(filepath.Base(f)) {
continue
}
ext := filepath.Ext(f)
if ext == ".py" {
pyFileNames.Add(f)
if !hasPyBinaryEntryPointFile && f == pyBinaryEntrypointFilename {
hasPyBinaryEntryPointFile = true
} else if !hasPyTestEntryPointFile && f == pyTestEntrypointFilename {
hasPyTestEntryPointFile = true
} else if f == conftestFilename {
hasConftestFile = true
} else if matchesAnyGlob(f, testFileGlobs) {
pyTestFilenames.Add(f)
} else {
pyLibraryFilenames.Add(f)
}

For example, if a py_binary only has excluded srcs, it should be cleaned up too.

There is also logic to collect py files from subdirs that is needed by project mode:

for _, d := range args.Subdirs {

Since we scope this PR down to py_binary only, the only "regular files" we care about are __main__.py and mainModules here:

allDeps, mainModules, annotations, err := parser.parse(srcs)

The "GenFiles" and "isTarget` logic can stay, because they are not handled anywhere else

@yushan26 yushan26 requested a review from rickeylev as a code owner July 15, 2025 20:38
Comment on lines +514 to +524
hasValidSrcs := true
for _, src := range existingRule.AttrStrings("srcs") {
if isTarget(src) {
continue
}
if _, ok := filesMap[src]; ok {
continue
}
hasValidSrcs = false
break
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
hasValidSrcs := true
for _, src := range existingRule.AttrStrings("srcs") {
if isTarget(src) {
continue
}
if _, ok := filesMap[src]; ok {
continue
}
hasValidSrcs = false
break
}
var hasValidSrcs bool
for _, src := range existingRule.AttrStrings("srcs") {
if isTarget(src) {
hasValidSrcs = true
break
}
if _, ok := filesMap[src]; ok {
hasValidSrcs = true
break
}
}

It should be something like this.

return
}
filesMap := make(map[string]struct{})
for _, file := range args.RegularFiles {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about those python files in subdirs? Thoughts on my previous comment about reading mainModules instead of args.RegularFiles?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy