-
Notifications
You must be signed in to change notification settings - Fork 110
add experimental optional feature to use in memory joins #605
add experimental optional feature to use in memory joins #605
Conversation
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
"reflect" | ||
|
||
opentracing "github.com/opentracing/opentracing-go" | ||
"gopkg.in/src-d/go-mysql-server.v0/sql" | ||
) | ||
|
||
const experimentalInMemoryJoinKey = "EXPERIMENTAL_IN_MEMORY_JOIN" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can prefix it by something like: SRC_D, or MYSQL, or ... WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In memory joins should be used only if the size of the elements of one side of the join fits on memory. How do you know which side of the join should be loaded in memory? As I can see on the code, we are always putting in memory the right branch.
@ajnavarro we simply can't know right now. This is why this is behind a flag. This is a "here be dragons, use at your own risk" kind of feature for cases where regular inner join is too slow and you have lots of memory to use. I don't see this enabled by default until we have a cost-based optimized (which is not likely in the foreseeable future). |
we don't need a full cost-based optimizer for this. Maybe providing a method to tables to get an estimated size we can guess if it is worth it to fetch on memory or not. But we can discuss this implementation later. |
@ajnavarro then this would only be applied to small tables. If a table is big (even if the end result of the branch after filters and so on is not), it will never be in memory. |
then, maybe would be better allow to the user to activate or deactivate this functionality per query, using some session variable to do it (I would keep the actual env variable to globally activate the functionality anyways), WDYT? |
@ajnavarro sounds good to me |
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
Updated with the session variable |
This exposes the env var
EXPERIMENTAL_IN_MEMORY_JOIN
to enable the in-memory join feature, which causes inner joins to be performed in memory, which is significantly faster.This way, clients can decide if they should enable in memory joins or not depending on their environment. If they can trade memory usage for speed this is orders of magnitude faster (specially for computationally expensive inner join branches) than the regular inner joins.
These are the benchmarks: