Skip to content

perf: Replace expensive len() call with PandasBatches.total_rows in anywidget TableWidget #1937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

shuoweil
Copy link
Contributor

perf: Replace expensive len() call with PandasBatches.total_rows in anywidget TableWidget

@shuoweil shuoweil self-assigned this Jul 24, 2025
@shuoweil shuoweil requested review from a team as code owners July 24, 2025 23:22
@shuoweil shuoweil requested a review from GarrettWu July 24, 2025 23:22
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jul 24, 2025
@shuoweil shuoweil requested review from tswast and removed request for GarrettWu July 24, 2025 23:23
@shuoweil shuoweil force-pushed the shuowei-anywidget-remove-len-call branch from aee37a7 to 303c4af Compare July 24, 2025 23:23
@tswast
Copy link
Collaborator

tswast commented Jul 29, 2025

Please also update the benchmarks to use the total_rows parameter.

@shuoweil shuoweil force-pushed the shuowei-anywidget-remove-len-call branch from fc38cf3 to f643cfb Compare July 30, 2025 03:28
@shuoweil
Copy link
Contributor Author

Please also update the benchmarks to use the total_rows parameter.

Let's use a separate PR for this request.

@shuoweil shuoweil requested a review from tswast July 30, 2025 03:29
@shuoweil shuoweil force-pushed the shuowei-anywidget-remove-len-call branch 2 times, most recently from e12c8ff to f8ab27b Compare July 30, 2025 22:07
@shuoweil shuoweil added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 31, 2025
@bigframes-bot bigframes-bot removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 31, 2025
@shuoweil shuoweil force-pushed the shuowei-anywidget-remove-len-call branch from f8ab27b to df85824 Compare July 31, 2025 04:32
Comment on lines 87 to 90
if self._batches:
self.row_count = self._batches.total_rows or 0
else:
self.row_count = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you encounter this case? If so, could you reproduce it? It could indicate a bug, since even with empty results we should still have a PandasBatches object. Or wast this just for the type checker?

Since batches should never be None, I suggest using that instead of self._batches.

Suggested change
if self._batches:
self.row_count = self._batches.total_rows or 0
else:
self.row_count = 0
self.row_count = batches.total_rows or 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to believe I had mypy error, however, after change, everything pass. I will take the suggestion.

self.row_count = self._batches.total_rows or 0
else:
self.row_count = 0
self.page_size = initial_page_size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is delayed so that self._batches is available when the change trigger happens? If so, please set a comment explaining that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment is added.

# SELECT COUNT(*) query. It is a must have however.
# TODO(b/428238610): Start iterating over the result of `to_pandas_batches()`
# before we get here so that the count might already be cached.
self.row_count = len(dataframe)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the same change to the tests in https://github.com/googleapis/python-bigquery-dataframes/tree/main/tests/benchmark/read_gbq_colab in this PR so that our benchmarks reflect the "real" logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modify the following two files:
(1) tests/benchmark/read_gbq_colab/first_page.py
(2) tests/benchmark/read_gbq_colab/last_page.py

@shuoweil shuoweil force-pushed the shuowei-anywidget-remove-len-call branch from df85824 to 2756968 Compare August 1, 2025 08:06
@shuoweil shuoweil requested a review from tswast August 1, 2025 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy