Skip to content
This repository was archived by the owner on Jan 28, 2021. It is now read-only.
This repository was archived by the owner on Jan 28, 2021. It is now read-only.

Parallelize index creation #346

@erizocosmico

Description

@erizocosmico

Goals

Now that we have partitions, we can parallelise the creation of indexes so it takes a lot less time to create them instead of doing so sequentially.

Challenges

The indexes are a big matrix of 1s and 0s, being the columns the database row column and the rows the unique value of the rows for the particular table field.

We also keep a mapping from value to the row id in the bitmap. So, for that you need to keep track of the row ids.

The problem comes from the columns. The column id is sequentially incremented.

Ideas

We could share a global counter protected by a mutex. That way, there would be no need to pass the colID and keep it sequential. This has one downside, though: index creation is not idempotent and the order in which the rows are stored (and thus, returned) will be different every time you create the same index.

Also, we would need to stop saving batches and fields in the driver structure, as multiple operations might be taking place at the same time.

UPDATE: this can't be done or indexes can't be combined.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestperformancePerformance improvements

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions

    pFad - Phonifier reborn

    Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

    Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


    Alternative Proxies:

    Alternative Proxy

    pFad Proxy

    pFad v3 Proxy

    pFad v4 Proxy