Skip to content
This repository was archived by the owner on Jan 28, 2021. It is now read-only.

add experimental optional feature to use in memory joins #605

Merged
merged 2 commits into from
Jan 28, 2019

Conversation

erizocosmico
Copy link
Contributor

This exposes the env var EXPERIMENTAL_IN_MEMORY_JOIN to enable the in-memory join feature, which causes inner joins to be performed in memory, which is significantly faster.

This way, clients can decide if they should enable in memory joins or not depending on their environment. If they can trade memory usage for speed this is orders of magnitude faster (specially for computationally expensive inner join branches) than the regular inner joins.

These are the benchmarks:

goos: darwin
goarch: amd64
pkg: gopkg.in/src-d/go-mysql-server.v0/sql/plan
BenchmarkInnerJoin/inner_join-4         	   50000	     33170 ns/op	   10618 B/op	     144 allocs/op
BenchmarkInnerJoin/in_memory_inner_join-4         	  100000	     21972 ns/op	    7322 B/op	      84 allocs/op
BenchmarkInnerJoin/cross_join_with_filter-4       	   30000	     41750 ns/op	   12122 B/op	     181 allocs/op
PASS
ok  	gopkg.in/src-d/go-mysql-server.v0/sql/plan	6.112s

Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
@erizocosmico erizocosmico requested a review from a team January 28, 2019 09:20
"reflect"

opentracing "github.com/opentracing/opentracing-go"
"gopkg.in/src-d/go-mysql-server.v0/sql"
)

const experimentalInMemoryJoinKey = "EXPERIMENTAL_IN_MEMORY_JOIN"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can prefix it by something like: SRC_D, or MYSQL, or ... WDYT?

Copy link
Contributor

@ajnavarro ajnavarro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In memory joins should be used only if the size of the elements of one side of the join fits on memory. How do you know which side of the join should be loaded in memory? As I can see on the code, we are always putting in memory the right branch.

@erizocosmico
Copy link
Contributor Author

@ajnavarro we simply can't know right now. This is why this is behind a flag. This is a "here be dragons, use at your own risk" kind of feature for cases where regular inner join is too slow and you have lots of memory to use. I don't see this enabled by default until we have a cost-based optimized (which is not likely in the foreseeable future).

@ajnavarro
Copy link
Contributor

we don't need a full cost-based optimizer for this. Maybe providing a method to tables to get an estimated size we can guess if it is worth it to fetch on memory or not. But we can discuss this implementation later.

@erizocosmico
Copy link
Contributor Author

@ajnavarro then this would only be applied to small tables. If a table is big (even if the end result of the branch after filters and so on is not), it will never be in memory.

@ajnavarro
Copy link
Contributor

then, maybe would be better allow to the user to activate or deactivate this functionality per query, using some session variable to do it (I would keep the actual env variable to globally activate the functionality anyways), WDYT?

@erizocosmico
Copy link
Contributor Author

@ajnavarro sounds good to me

Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
@erizocosmico
Copy link
Contributor Author

Updated with the session variable

@ajnavarro ajnavarro merged commit f33e6ea into src-d:master Jan 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy