Skip to content

Chunk download latency #634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Jul 21, 2025
Merged

Chunk download latency #634

merged 22 commits into from
Jul 21, 2025

Conversation

saishreeeee
Copy link
Contributor

@saishreeeee saishreeeee commented Jul 13, 2025

What type of PR is this?

  • Refactor
  • Feature
  • Bug Fix
  • Other

Description

Record chunk download latency
Added chunk_id to SqlExecutionEvent and ResultDownloadHandler

How is this tested?

  • Unit tests
  • E2E Tests
  • Manually
  • N/A

Ran the query: SELECT * FROM RANGE(20000000)
Latency was recorded for 4 chunks.
The latency log for the 1st chunk:

{
  "frontend_log_event_id": "e684ff49-9026-49af-abb6-687a6a16e565",
  "context": {
    "client_context": {
      "timestamp_millis": 1752467452769,
      "user_agent": "PyDatabricksSqlConnector/4.0.5"
    }
  },
  "entry": {
    "sql_driver_log": {
      "session_id": "01f0606b-4825-118d-abd2-87f5748c7dd3",
      "system_configuration": {
        "driver_version": "4.0.5",
        "os_name": "Darwin",
        "os_version": "24.5.0",
        "os_arch": "arm64",
        "runtime_name": "Python 3.13.3",
        "runtime_version": "3.13.3",
        "runtime_vendor": "CPython",
        "driver_name": "Databricks SQL Python Connector",
        "char_set_encoding": "utf-8",
        "locale_name": "en_US"
      },
      "driver_connection_params": {
        "http_path": <HTTP_PATH>,
        "mode": "THRIFT",
        "host_info": {
          "host_url": <SERVER_HOSTNAME>,
          "port": 443
        },
        "auth_mech": "PAT"
      },
      "sql_statement_id": "01f0606b-4854-1f89-a0cb-d5860332ef81",
      "sql_operation": {
        "statement_type": "QUERY",
        "is_compressed": true,
        "execution_result": "EXTERNAL_LINKS",
        "chunk_id": 0
      },
      "operation_latency_ms": 15882
    }
  }
}

Related Tickets & Documents

PECOBLR-653

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
varun-edachali-dbx and others added 2 commits July 14, 2025 09:35
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
-
Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
@saishreeeee saishreeeee marked this pull request as ready for review July 14, 2025 04:46
@saishreeeee saishreeeee self-assigned this Jul 14, 2025
@saishreeeee saishreeeee requested a review from jprakash-db July 14, 2025 04:46
Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
@vikrantpuppala
Copy link
Contributor

@jayantsing-db can you take a look, this PR adds latency logs (merges into sea-migration) as there are some refactorings that this PR leverages

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
Copy link
Contributor

@jprakash-db jprakash-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for making the changes

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
Copy link
Collaborator

@varun-edachali-dbx varun-edachali-dbx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@saishreeeee saishreeeee merged commit b57c3f3 into sea-migration Jul 21, 2025
23 checks passed
varun-edachali-dbx added a commit that referenced this pull request Jul 22, 2025
@varun-edachali-dbx varun-edachali-dbx mentioned this pull request Jul 23, 2025
5 tasks
varun-edachali-dbx added a commit that referenced this pull request Aug 4, 2025
* allow empty schema bytes for alignment with SEA

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* pass is_vl_op to Sea backend ExecuteResponse

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove catalog requirement in get_tables

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* move filters.py to SEA utils

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* ensure SeaResultSet

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* prevent circular imports

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove unused imports

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove cast, throw error if not SeaResultSet

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* pass param as TSparkParameterValue

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove failing test (temp)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove SeaResultSet type assertion

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* change errors to align with spec, instead of arbitrary ValueError

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* make SEA backend methods return SeaResultSet

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* use spec-aligned Exceptions in SEA backend

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove defensive row type check

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* raise ProgrammingError for invalid id

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* make is_volume_operation strict bool

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove complex types code

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* Revert "remove complex types code"

This reverts commit 138359d.

* introduce type conversion for primitive types for JSON + INLINE

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove SEA running on metadata queries (known failures

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary docstrings

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* align expected types with databricks sdk

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* link rest api reference to validate types

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove test_catalogs_returns_arrow_table test

metadata commands not expected to pass

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix fetchall_arrow and fetchmany_arrow

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove thrift aligned test_cancel_during_execute from SEA tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary changes in example scripts

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary chagnes in example scripts

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* _convert_json_table -> _create_json_table

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove accidentally removed test

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove new unit tests (to be re-added based on new arch)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove changes in sea_result_set functionality (to be re-added)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* introduce more integration tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove SEA tests in parameterized queries

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove partial parameter fix changes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary timestamp tests

(pass with minor disparity)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* slightly stronger typing of _convert_json_types

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* stronger typing of json utility func s

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* stronger typing of fetch*_json

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove unused helper methods in SqlType

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* line breaks after multi line pydocs, remove excess logs

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* line breaks after multi line pydocs, reduce diff of redundant changes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce diff of redundant changes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* mandate ResultData in SeaResultSet constructor

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove complex type conversion

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* correct fetch*_arrow

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* recover old sea tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* move queue and result set into SEA specific dir

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* pass ssl_options into CloudFetchQueue

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce diff

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove redundant conversion.py

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix type issues

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* ValueError not ProgrammingError

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce diff

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* introduce SEA cloudfetch e2e tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* allow empty cloudfetch result

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add unit tests for CloudFetchQueue and SeaResultSet

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* skip pyarrow dependent tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* simplify download process: no pre-fetching

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* correct class name in logs

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* align with old impl

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* align next_n_rows with prev imple

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* align remaining_rows with prev impl

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary Optional params

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary changes in thrift field if tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove unused imports

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* init hybrid

* run large queries

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* hybrid disposition

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-ncessary log

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting (black)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove redundant tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* multi frame decompression of lz4

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* ensure no compression (temp)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* introduce separate link fetcher

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* log time to create table

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add chunk index to table creation time log

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove custom multi-frame decompressor for lz4

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove excess logs

* remove redundant tests (temp)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add link to download manager before notifying consumer

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* move link fetching immediately before table creation so link expiry is not an issue

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* resolve merge artifacts

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove redundant methods

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting (black)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* introduce callback to handle link expiry

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix types

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix param type in unit tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting + minor type fixes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* Revert "introduce callback to handle link expiry"

This reverts commit bd51b1c.

* remove unused callback (to be introduced later)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* correct param extraction

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove common constructor for databricks client abc

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* make SEA Http Client instance a private member

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* make GetChunksResponse model more robust

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add link to doc of GetChunk response model

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* pass result_data instead of "initial links" into SeaCloudFetchQueue

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* move download_manager init into parent CloudFetchQueue

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* raise ServerOperationError for no 0th chunk

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* unused iports

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* return None in case of empty respose

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* ensure table is empty on no initial link s

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* account for total chunk count

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* iterate by chunk index instead of link

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* make LinkFetcher convert link static

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add helper for link addition, check for edge case to prevent inf wait

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add unit tests for LinkFetcher

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary download manager check

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary string literals around param type

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove duplicate download_manager init

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* account for empty response in LinkFetcher init

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* make get_chunk_link return mandatory ExternalLink

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* set shutdown_event instead of breaking on completion so get_chunk_link is informed

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* docstrings, logging, pydoc

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* use total_chunk_cound > 0

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* clarify that link has already been submitted on getting row_offset

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* return None for out of range

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* default link_fetcher to None

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

---------

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* Chunk download latency (#634)

* chunk download latency

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* formatting

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* test fixes

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* sea-migration static type checking fixes

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* check types fix

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* fix type issues

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* type fix revert

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* -

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* statement id in get metadata functions

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* removed result set extractor

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* databricks client type

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* formatting

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* remove defaults, fix chunk id

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* added statement type to command id

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* check types fix

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* renamed chunk_id to num_downloaded_chunks

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* set statement type to query for chunk download

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* comment fix

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* removed dup check for trowset

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

---------

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* acquire lock before notif + formatting (black)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix imports

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add get_chunk_link s

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* simplify description extraction

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* pass session_id_hex to ThriftResultSet

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* revert to main's extract description

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* validate row count for sync query tests as well

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* guid_hex -> hex_guid

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce diff

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce diff

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce diff

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* set .value in compression

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce diff

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting (black)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove redundant test

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* move extra_params to the back

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* is_direct_results -> has_more_rows

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* Revert "is_direct_results -> has_more_rows"

This reverts commit 0e87374.

* stop passing session_id_hex

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove redundant comment

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add extra_params param

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* pass extra_params into test_...unset...

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove excess session_id_he

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce changes in DatabricksRetryPolicy

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce diff in DatabricksRetryPolicy

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* simple comments on proxy setting

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* link docs for getproxies)(

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* rename proxy specific attrs with proxy prefix

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

---------

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
varun-edachali-dbx added a commit that referenced this pull request Aug 4, 2025
* remove redundant conversion.py

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix type issues

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* ValueError not ProgrammingError

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce diff

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* introduce SEA cloudfetch e2e tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* allow empty cloudfetch result

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add unit tests for CloudFetchQueue and SeaResultSet

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* skip pyarrow dependent tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* simplify download process: no pre-fetching

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* correct class name in logs

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* align with old impl

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* align next_n_rows with prev imple

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* align remaining_rows with prev impl

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary Optional params

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary changes in thrift field if tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove unused imports

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* init hybrid

* run large queries

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* hybrid disposition

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-ncessary log

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting (black)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove redundant tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* multi frame decompression of lz4

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* ensure no compression (temp)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* introduce separate link fetcher

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* log time to create table

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add chunk index to table creation time log

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove custom multi-frame decompressor for lz4

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove excess logs

* remove redundant tests (temp)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add link to download manager before notifying consumer

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* move link fetching immediately before table creation so link expiry is not an issue

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* resolve merge artifacts

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove redundant methods

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting (black)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* introduce callback to handle link expiry

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix types

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix param type in unit tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting + minor type fixes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* Revert "introduce callback to handle link expiry"

This reverts commit bd51b1c.

* remove unused callback (to be introduced later)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* correct param extraction

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove common constructor for databricks client abc

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* make SEA Http Client instance a private member

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* make GetChunksResponse model more robust

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add link to doc of GetChunk response model

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* pass result_data instead of "initial links" into SeaCloudFetchQueue

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* move download_manager init into parent CloudFetchQueue

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* raise ServerOperationError for no 0th chunk

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* unused iports

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* return None in case of empty respose

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* ensure table is empty on no initial link s

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* account for total chunk count

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* iterate by chunk index instead of link

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* make LinkFetcher convert link static

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add helper for link addition, check for edge case to prevent inf wait

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add unit tests for LinkFetcher

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary download manager check

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary string literals around param type

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove duplicate download_manager init

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* account for empty response in LinkFetcher init

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* make get_chunk_link return mandatory ExternalLink

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* set shutdown_event instead of breaking on completion so get_chunk_link is informed

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* docstrings, logging, pydoc

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* use total_chunk_cound > 0

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* clarify that link has already been submitted on getting row_offset

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* return None for out of range

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* default link_fetcher to None

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

---------

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* Chunk download latency (#634)

* chunk download latency

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* formatting

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* test fixes

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* sea-migration static type checking fixes

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* check types fix

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* fix type issues

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* type fix revert

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* -

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* statement id in get metadata functions

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* removed result set extractor

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* databricks client type

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* formatting

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* remove defaults, fix chunk id

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* added statement type to command id

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* check types fix

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* renamed chunk_id to num_downloaded_chunks

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* set statement type to query for chunk download

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* comment fix

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* removed dup check for trowset

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

---------

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* acquire lock before notif + formatting (black)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix imports

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add get_chunk_link s

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* simplify description extraction

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* pass session_id_hex to ThriftResultSet

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* revert to main's extract description

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* validate row count for sync query tests as well

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* guid_hex -> hex_guid

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce diff

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce diff

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce diff

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* set .value in compression

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce diff

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* is_direct_results -> has_more_rows

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* preliminary large metadata results

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* account for empty table in arrow table filter

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* align flows

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* align flow of json with arrow

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* case sensitive support for arrow table

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary comment

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix merge artifacts

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove redundant method

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove incorrect docstring

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove deepcopy

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

---------

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy