Skip to content

py/qstr: Add support for sorted qstr pools. #12678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions ports/bare-arm/lib.c
Original file line number Diff line number Diff line change
Expand Up @@ -109,16 +109,23 @@ char *strchr(const char *s, int c) {
return NULL;
}

int strcmp(const char *s1, const char *s2) {
while (*s1 && *s2) {
int strncmp(const char *s1, const char *s2, size_t n) {
while (*s1 && *s2 && n-- > 0) {
int c = *s1++ - *s2++;
if (c) {
return c;
}
}
if (n == 0) {
return 0;
}
return *s1 - *s2;
}

int strcmp(const char *s1, const char *s2) {
return strncmp(s1, s2, 0x7fffffff);
}

size_t strlen(const char *s) {
const char *ss = s;
while (*ss) {
Expand Down
114 changes: 84 additions & 30 deletions py/makeqstrdata.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,8 @@
codepoint2name[ord("|")] = "pipe"
codepoint2name[ord("~")] = "tilde"

# static qstrs, should be sorted
# static qstrs, these must maintain a specific order for .mpy compatibility
Copy link
Contributor

@projectgus projectgus Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe more extreme version of the previous idea, but what about adding a constant lookup table (equivalent of mp_binary_op_method_name) for these? That would add 157 byte code size, but also allow this whole qstr pool to be sorted. In the next .mpy version bump the lookup table could be removed to get the code size back.

(This would also potentially allow deleting the !sorted case from the search lookup code as all pools would be sorted, wouldn't it?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea! It might be easier just to go with three sorted pools (because this list is almost sorted) as described above.

Unfortunately, any runtime qstrs are not sorted (because the pools are created incrementally), so the !sorted case needs to stay.

Copy link
Contributor

@projectgus projectgus Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing it's not viable to insertion sort the runtime pool? EDIT: Of course it's not, the addresses need to be stable. Oh well.

I'll stop proposing ever more complex changes now, I promise. 😆

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@projectgus

The main problem is that we return the qstr id at insertion time. This id is the index across all the pools (i.e. the first pool contains qstr 0 -> N, the second contains N+1 -> M, etc).

So sorting would require some way to know the final index in the pool, even though the sort ordering might change with further insertions.

There are easy ways to solve this with extra indirection of course, but that would be an extra 2*sizeof(qstr) per entry in the runtime tables. Maybe this is worth considering?

Also other ideas around some sort of hash-table-like approach. The problem with hashtables though is that the main expensive operation we perform on the pool is set-existence (i.e. we have this new string from somewhere, and we're trying to decide whether it's an existing qstr or not). And worse, the expected outcome for any given pool is negative, which is pathological for high-load-factor hashtables.

# See QSTR_LAST_STATIC at the top of py/persistentcode.c

static_qstr_list = [
"",
Expand Down Expand Up @@ -222,6 +223,71 @@
"zip",
]

# Additional QSTRs that must have index <255 because they are stored in
# `mp_binary_op_method_name` and `mp_unary_op_method_name` (see py/objtype.c).
# These are not part of the .mpy compatibility list, but we place them in the
# fixed unsorted pool (i.e. QDEF0) to ensure their indices are small.
operator_qstr_list = {
"__bool__",
"__pos__",
"__neg__",
"__invert__",
"__abs__",
"__float__",
"__complex__",
"__sizeof__",
"__lt__",
"__gt__",
"__eq__",
"__le__",
"__ge__",
"__ne__",
"__contains__",
"__iadd__",
"__isub__",
"__imul__",
"__imatmul__",
"__ifloordiv__",
"__itruediv__",
"__imod__",
"__ipow__",
"__ior__",
"__ixor__",
"__iand__",
"__ilshift__",
"__irshift__",
"__add__",
"__sub__",
"__mul__",
"__matmul__",
"__floordiv__",
"__truediv__",
"__mod__",
"__divmod__",
"__pow__",
"__or__",
"__xor__",
"__and__",
"__lshift__",
"__rshift__",
"__radd__",
"__rsub__",
"__rmul__",
"__rmatmul__",
"__rfloordiv__",
"__rtruediv__",
"__rmod__",
"__rpow__",
"__ror__",
"__rxor__",
"__rand__",
"__rlshift__",
"__rrshift__",
"__get__",
"__set__",
"__delete__",
}


# this must match the equivalent function in qstr.c
def compute_hash(qstr, bytes_hash):
Expand All @@ -244,22 +310,13 @@ def esc_char(m):
return re.sub(r"[^A-Za-z0-9_]", esc_char, qst)


static_qstr_list_ident = list(map(qstr_escape, static_qstr_list))


def parse_input_headers(infiles):
qcfgs = {}
qstrs = {}

# add static qstrs
for qstr in static_qstr_list:
# work out the corresponding qstr name
ident = qstr_escape(qstr)

# don't add duplicates
assert ident not in qstrs

# add the qstr to the list, with order number to retain original order in file
order = len(qstrs) - 300000
qstrs[ident] = (order, ident, qstr)

# read the qstrs in from the input files
for infile in infiles:
with open(infile, "rt") as f:
Expand Down Expand Up @@ -294,22 +351,12 @@ def parse_input_headers(infiles):
ident = qstr_escape(qstr)

# don't add duplicates
if ident in static_qstr_list_ident:
continue
if ident in qstrs:
continue

# add the qstr to the list, with order number to retain original order in file
order = len(qstrs)
# but put special method names like __add__ at the top of list, so
# that their id's fit into a byte
if ident == "":
# Sort empty qstr above all still
order = -200000
elif ident == "__dir__":
# Put __dir__ after empty qstr for builtin dir() to work
order = -190000
elif ident.startswith("__"):
order -= 100000
qstrs[ident] = (order, ident, qstr)
qstrs[ident] = (ident, qstr)

if not qcfgs:
sys.stderr.write("ERROR: Empty preprocessor output - check for errors above\n")
Expand Down Expand Up @@ -348,12 +395,19 @@ def print_qstr_data(qcfgs, qstrs):
print("")

# add NULL qstr with no hash or data
print('QDEF(MP_QSTRnull, 0, 0, "")')
print('QDEF0(MP_QSTRnull, 0, 0, "")')

# add static qstrs to the first unsorted pool
for qstr in static_qstr_list:
qbytes = make_bytes(cfg_bytes_len, cfg_bytes_hash, qstr)
print("QDEF0(MP_QSTR_%s, %s)" % (qstr_escape(qstr), qbytes))

# go through each qstr and print it out
for order, ident, qstr in sorted(qstrs.values(), key=lambda x: x[0]):
# add remaining qstrs to the sorted (by value) pool (unless they're in
# operator_qstr_list, in which case add them to the unsorted pool)
for ident, qstr in sorted(qstrs.values(), key=lambda x: x[1]):
qbytes = make_bytes(cfg_bytes_len, cfg_bytes_hash, qstr)
print("QDEF(MP_QSTR_%s, %s)" % (ident, qbytes))
pool = 0 if qstr in operator_qstr_list else 1
print("QDEF%d(MP_QSTR_%s, %s)" % (pool, ident, qbytes))


def do_work(infiles):
Expand Down
5 changes: 5 additions & 0 deletions py/persistentcode.c
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@

#include "py/smallint.h"

// makeqstrdata.py has a fixed list of qstrs at the start that we can assume
// are available with know indices on all MicroPython implementations, and
// avoid needing to duplicate the string data in the .mpy file. This is the
// last one in that list (anything with a qstr less than or equal to this is
// assumed to be in the list).
#define QSTR_LAST_STATIC MP_QSTR_zip

#if MICROPY_DYNAMIC_COMPILER
Expand Down
98 changes: 84 additions & 14 deletions py/qstr.c
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,6 @@
#include "py/gc.h"
#include "py/runtime.h"

// NOTE: we are using linear arrays to store and search for qstr's (unique strings, interned strings)
// ultimately we will replace this with a static hash table of some kind

#if MICROPY_DEBUG_VERBOSE // print debugging info
#define DEBUG_printf DEBUG_printf
#else // don't print debugging info
Expand Down Expand Up @@ -74,38 +71,94 @@ size_t qstr_compute_hash(const byte *data, size_t len) {
return hash;
}

// The first pool is the static qstr table. The contents must remain stable as
// it is part of the .mpy ABI. See the top of py/persistentcode.c and
// static_qstr_list in makeqstrdata.py. This pool is unsorted (although in a
// future .mpy version we could re-order them and make it sorted). It also
// contains additional qstrs that must have IDs <256, see operator_qstr_list
// in makeqstrdata.py.
const qstr_hash_t mp_qstr_const_hashes_static[] = {
#ifndef NO_QSTR
#define QDEF0(id, hash, len, str) hash,
#define QDEF1(id, hash, len, str)
#include "genhdr/qstrdefs.generated.h"
#undef QDEF0
#undef QDEF1
#endif
};

const qstr_len_t mp_qstr_const_lengths_static[] = {
#ifndef NO_QSTR
#define QDEF0(id, hash, len, str) len,
#define QDEF1(id, hash, len, str)
#include "genhdr/qstrdefs.generated.h"
#undef QDEF0
#undef QDEF1
#endif
};

const qstr_pool_t mp_qstr_const_pool_static = {
NULL, // no previous pool
0, // no previous pool
false, // is_sorted
MICROPY_ALLOC_QSTR_ENTRIES_INIT,
MP_QSTRnumber_of_static, // corresponds to number of strings in array just below
(qstr_hash_t *)mp_qstr_const_hashes_static,
(qstr_len_t *)mp_qstr_const_lengths_static,
{
#ifndef NO_QSTR
#define QDEF0(id, hash, len, str) str,
#define QDEF1(id, hash, len, str)
#include "genhdr/qstrdefs.generated.h"
#undef QDEF0
#undef QDEF1
#endif
},
};

// The next pool is the remainder of the qstrs defined in the firmware. This
// is sorted.
const qstr_hash_t mp_qstr_const_hashes[] = {
#ifndef NO_QSTR
#define QDEF(id, hash, len, str) hash,
#define QDEF0(id, hash, len, str)
#define QDEF1(id, hash, len, str) hash,
#include "genhdr/qstrdefs.generated.h"
#undef QDEF
#undef QDEF0
#undef QDEF1
#endif
};

const qstr_len_t mp_qstr_const_lengths[] = {
#ifndef NO_QSTR
#define QDEF(id, hash, len, str) len,
#define QDEF0(id, hash, len, str)
#define QDEF1(id, hash, len, str) len,
#include "genhdr/qstrdefs.generated.h"
#undef QDEF
#undef QDEF0
#undef QDEF1
#endif
};

const qstr_pool_t mp_qstr_const_pool = {
NULL, // no previous pool
0, // no previous pool
&mp_qstr_const_pool_static,
MP_QSTRnumber_of_static,
true, // is_sorted
MICROPY_ALLOC_QSTR_ENTRIES_INIT,
MP_QSTRnumber_of, // corresponds to number of strings in array just below
MP_QSTRnumber_of - MP_QSTRnumber_of_static, // corresponds to number of strings in array just below
(qstr_hash_t *)mp_qstr_const_hashes,
(qstr_len_t *)mp_qstr_const_lengths,
{
#ifndef NO_QSTR
#define QDEF(id, hash, len, str) str,
#define QDEF0(id, hash, len, str)
#define QDEF1(id, hash, len, str) str,
#include "genhdr/qstrdefs.generated.h"
#undef QDEF
#undef QDEF0
#undef QDEF1
#endif
},
};

// If frozen code is enabled, then there is an additional, sorted, ROM pool
// containing additional qstrs required by the frozen code.
#ifdef MICROPY_QSTR_EXTRA_POOL
extern const qstr_pool_t MICROPY_QSTR_EXTRA_POOL;
#define CONST_POOL MICROPY_QSTR_EXTRA_POOL
Expand Down Expand Up @@ -185,7 +238,24 @@ qstr qstr_find_strn(const char *str, size_t str_len) {

// search pools for the data
for (const qstr_pool_t *pool = MP_STATE_VM(last_pool); pool != NULL; pool = pool->prev) {
for (mp_uint_t at = 0, top = pool->len; at < top; at++) {
size_t low = 0;
size_t high = pool->len - 1;

// binary search inside the pool
if (pool->is_sorted) {
while (high - low > 1) {
size_t mid = (low + high) / 2;
int cmp = strncmp(str, pool->qstrs[mid], str_len);
if (cmp <= 0) {
high = mid;
} else {
low = mid;
}
}
}

// sequential search for the remaining strings
for (mp_uint_t at = low; at < high + 1; at++) {
if (pool->hashes[at] == str_hash && pool->lengths[at] == str_len
&& memcmp(pool->qstrs[at], str, str_len) == 0) {
return pool->total_prev_len + at;
Expand All @@ -194,7 +264,7 @@ qstr qstr_find_strn(const char *str, size_t str_len) {
}

// not found; return null qstr
return 0;
return MP_QSTRnull;
}

qstr qstr_from_str(const char *str) {
Expand Down
19 changes: 16 additions & 3 deletions py/qstr.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,21 @@
// first entry in enum will be MP_QSTRnull=0, which indicates invalid/no qstr
enum {
#ifndef NO_QSTR
#define QDEF(id, hash, len, str) id,
#define QDEF0(id, hash, len, str) id,
#define QDEF1(id, hash, len, str)
#include "genhdr/qstrdefs.generated.h"
#undef QDEF
#undef QDEF0
#undef QDEF1
#endif
MP_QSTRnumber_of_static,
MP_QSTRstart_of_main = MP_QSTRnumber_of_static - 1, // unused but shifts the enum counter back one

#ifndef NO_QSTR
#define QDEF0(id, hash, len, str)
#define QDEF1(id, hash, len, str) id,
#include "genhdr/qstrdefs.generated.h"
#undef QDEF0
#undef QDEF1
#endif
MP_QSTRnumber_of, // no underscore so it can't clash with any of the above
};
Expand All @@ -66,7 +78,8 @@ typedef uint16_t qstr_len_t;

typedef struct _qstr_pool_t {
const struct _qstr_pool_t *prev;
size_t total_prev_len;
size_t total_prev_len : (8 * sizeof(size_t) - 1);
size_t is_sorted : 1;
size_t alloc;
size_t len;
qstr_hash_t *hashes;
Expand Down
Loading
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy