-
Notifications
You must be signed in to change notification settings - Fork 52
perf: Replace expensive len() call with PandasBatches.total_rows in anywidget TableWidget #1937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
aee37a7
to
303c4af
Compare
Please also update the benchmarks to use the |
fc38cf3
to
f643cfb
Compare
Let's use a separate PR for this request. |
e12c8ff
to
f8ab27b
Compare
f8ab27b
to
df85824
Compare
bigframes/display/anywidget.py
Outdated
if self._batches: | ||
self.row_count = self._batches.total_rows or 0 | ||
else: | ||
self.row_count = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you encounter this case? If so, could you reproduce it? It could indicate a bug, since even with empty results we should still have a PandasBatches object. Or wast this just for the type checker?
Since batches
should never be None, I suggest using that instead of self._batches
.
if self._batches: | |
self.row_count = self._batches.total_rows or 0 | |
else: | |
self.row_count = 0 | |
self.row_count = batches.total_rows or 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used to believe I had mypy error, however, after change, everything pass. I will take the suggestion.
bigframes/display/anywidget.py
Outdated
self.row_count = self._batches.total_rows or 0 | ||
else: | ||
self.row_count = 0 | ||
self.page_size = initial_page_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is delayed so that self._batches
is available when the change trigger happens? If so, please set a comment explaining that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment is added.
# SELECT COUNT(*) query. It is a must have however. | ||
# TODO(b/428238610): Start iterating over the result of `to_pandas_batches()` | ||
# before we get here so that the count might already be cached. | ||
self.row_count = len(dataframe) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the same change to the tests in https://github.com/googleapis/python-bigquery-dataframes/tree/main/tests/benchmark/read_gbq_colab in this PR so that our benchmarks reflect the "real" logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modify the following two files:
(1) tests/benchmark/read_gbq_colab/first_page.py
(2) tests/benchmark/read_gbq_colab/last_page.py
df85824
to
2756968
Compare
perf: Replace expensive len() call with PandasBatches.total_rows in anywidget TableWidget