fix(opentelemetry): trace context propagation in process-pool workers #1017

gregbrowndev · 2025-08-03T22:45:16Z

What was changed

This PR fixes a bug in the OpenTelemetry TracingInterceptor affecting sync, multi-process activities. The fix ensures tracing capabilities are possible inside the user's activity implementation, e.g. creating child spans, trace events, log correlation, profile correlation, distributed tracing with other systems, etc.

Unlike async or sync multi-threaded activities, the TracingInterceptor/_ActivityInboundImpl interceptors had not propagated the OTEL trace context, in this case, across the process pool.

Note: Both async and threadpool executors manage the trace context via Python's contextvars.

For the process-pool executor, any data we want to send to the child process must be extracted from contextvars and/or otherwise passed to loop.run_in_executor as pickable arguments to the target _execute_sync_activity function, and then rebuilt into contextvars on the other side.

Since the trace context is created in the TracingInterceptor in the parent process, it would be difficult getting this all the way down to _ActivityInboundImpl where it can be sent to the child process without introducing OpenTelemetry as a core dependency. This change attempts to be as transparent as possible, but may introduce a breaking change (see end of section).

The TracingInterceptor 's inbound activity interceptor now handles the special case for sync, non-threadpool executor activities. It wraps the input.fn in a picklable dataclass that:

captures the trace context for the parent span created in the interceptor / parent process
exposes its own __call__ function that becomes the entrypoint of the subprocess task, which reattaches the trace context before delegating to the original activity function
preserves as much of the original activity function's metadata, using functools.wraps. This is because downstream interceptors, such as the SentryInterceptor in the Python examples (see feat: add example using Sentry V2 SDK samples-python#140), use reflection on the activity attributes, e.g. fn.__name__, fn.__module__.

Tests have been added to verify the fix and I've had this patch running in our production environment for several weeks without any issues.

Breaking Change

As mentioned above, this change may break downstream interceptors that rely on receiving the original activity function handle directly.

Care has been taken to ensure common properties are preserved using functools.wraps, like you would with a decorator. However, without more significant changes to other parts of the SDK, I think this cannot be avoided, since creating a closure function cannot be pickled.

Users would need to ensure any interceptor switched on the function name, fn.__name__, rather than a reference to the real function.

Why?

Users of the SDK's process-pool Worker currently cannot leverage OTEL tracing capabilities inside their own activity implementation. The TracingInterceptor correctly instruments the activity's root span, but further downstream tracing is not properly linked to this parent span. The following is currently broken in sync, multiprocess activities:

creating child spans
attaching trace events / attributes
log correlation (e.g. with experimental OTEL logging SDK)
profile correlation (e.g. with Grafana Pyroscope SDK)
propagating the trace context onwards for distributed tracing with other systems

Checklist

Closes: 669

How was this tested:

Added tests to verify:

child spans in the activity have the correct parent
the wrapped activity preserves the original function's metadata

Manual testing using the OTEL logging SDK in my app shows that logs emitted in the activities are injected with correct trace_id/span_id enabling log-correlation in Grafana/Tempo/Loki. I didn't want to add this to the tests as the logging SDK is still experimental.

Note: testing this was quite tricky, I used a proxy list in the server process manager to access the spans exported from the child process's SpanExporter. I don't expect this would ever be necessary in production code (especially with OpenTelemetry) since all of the OTEL tracing exporters that I've seen a use push-based approach to export spans directly to an OTEL collector or tracing backend directly. (I think I remember seeing a Jaeger guide that indicated scraping traces from an endpoint, but that was for native Jaeger tooling I think). With a push-based exporter, e.g. OTLPTraceExporter, the child process can simply export its spans without needing to consolidate them with the parent process, even while the parent span created in the TracingInterceptor is yet to complete and be exported, the tracing backends expect to receive spans out-of-order.

Any docs updates needed?

Hopefully, no change from users is necessary.

- Add test to show trace context is not available

- This test implementation isn't to be taken as a reference for production. The fixed `TracingInterceptor` works in production, provided you use the `OTLPSpanExporter` or other exporter that pushes traces to a collector or backend, rather than one that pulls traces from the server (if one exists). - Add a custom span exporter to write finished_spans to a list proxy created by the server process manager. This is because we want to test the full trace across the process pool. Again, in production, the child process can just export spans directly to a remote collector. Tracing is designed to handle distributed systems. - Ensure the child process is initialised with its own TracerProvider to avoid different default mp_context behaviours across MacOS and Linux

CLAassistant · 2025-08-03T22:45:23Z

All committers have signed the CLA.

…race-context-propagation

- For some reason, the docstring comparison for the reflection check seemed to fail in Python 3.9 - I shortened the docstring to make it easier to compare in VSCode test output, that seemed to fix the test. Maybe 3.9 doesn't strip leading spaces in the docstring (e.g. like textwrap.dedent)?

gregbrowndev and others added 5 commits July 20, 2025 16:10

fix(opentelemetry): trace propagation in process pool activities

e08802e

- Add test to show trace context is not available

wip: add impl and try to make test pass

98ed492

wip: refactor test helpers into separate modules

6d703c0

wip: test reflection isn't broken

c9b4b13

gregbrowndev requested a review from a team as a code owner August 3, 2025 22:45

gregbrowndev changed the title ~~Fix/opentelemetry trace context propagation~~ fix(opentelemetry): trace context propagation in process-pool workers Aug 3, 2025

gregbrowndev added 2 commits August 3, 2025 23:49

Merge remote-tracking branch 'upstream/main' into fix/opentelemetry-t…

a203bf7

…race-context-propagation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(opentelemetry): trace context propagation in process-pool workers #1017

fix(opentelemetry): trace context propagation in process-pool workers #1017

Uh oh!

gregbrowndev commented Aug 3, 2025

Uh oh!

CLAassistant commented Aug 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

fix(opentelemetry): trace context propagation in process-pool workers #1017

Are you sure you want to change the base?

fix(opentelemetry): trace context propagation in process-pool workers #1017

Uh oh!

Conversation

gregbrowndev commented Aug 3, 2025

What was changed

Breaking Change

Why?

Checklist

How was this tested:

Any docs updates needed?

Uh oh!

CLAassistant commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

CLAassistant commented Aug 3, 2025 •

edited

Loading