Skip to content

gh-134698: Hold a lock when the thread state is detached in ssl #134724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 25, 2025

Conversation

ZeroIntensity
Copy link
Member

@ZeroIntensity ZeroIntensity commented May 26, 2025

Inside of Py_BEGIN_ALLOW_THREADS blocks, OpenSSL calls need to be synchronized to prevent crashes. I'm doing this with a per-object mutex that is only held inside when the GIL or critical section is released.

@ZeroIntensity
Copy link
Member Author

@Conobi Would you mind pointing me to the downstream FastAPI issue?

@Conobi
Copy link

Conobi commented May 26, 2025

@ZeroIntensity
Copy link
Member Author

FastAPI was having problems with this in their test suite, right? Is there someone I should CC?

Modules/_ssl.c Outdated

/* Make sure the SSL error state is initialized */
ERR_clear_error();

PySSL_BEGIN_ALLOW_THREADS
Py_BEGIN_ALLOW_THREADS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use PySSL_BEGIN_ALLOW_THREADS(ssl->ctx) here. The PySSLContext's ctx is being accessed by SSL_new so we should lock it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably be good to add a test creating ssl contexts in multiple threads as well.

Copy link
Member Author

@ZeroIntensity ZeroIntensity May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing too special about this function, though. I'm not sure it's worth the effort to add a test for every single function fixed here.

Copy link
Member

@emmatyping emmatyping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see the improved thread safety! 🚀

Copy link
Contributor

@kumaraditya303 kumaraditya303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the discussion on issue, I am not convinced that adding more locks is the solution, it will degrade performance in multi threaded "safe" workloads which work today.

@bedevere-app
Copy link

bedevere-app bot commented May 26, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@ZeroIntensity
Copy link
Member Author

See the discussion on issue, I am not convinced that adding more locks is the solution, it will degrade performance in multi threaded "safe" workloads which work today.

Wait a minute, there shouldn't be any multithreaded workloads that get hit by this, right? They'll all crash right now.

@emmatyping
Copy link
Member

I asked Claude to generate a benchmark script that I tested with wrk, a web stress test tool

The script sets up SSL socket connections and sends them to a thread to operate on requests, much like I would expect servers to do in multi-threaded production workloads.

Script source

#!/usr/bin/env python3

import ssl
import socket
import argparse
import sys
import os
import threading
from concurrent.futures import ThreadPoolExecutor
import signal


class ThreadSafeCounter:
    """Thread-safe counter using threading.Lock"""
    
    def __init__(self, initial_value=0):
        self._value = initial_value
        self._lock = threading.Lock()
    
    def inc(self, amount=1):
        """Increment the counter by amount (default 1)"""
        with self._lock:
            self._value += amount
    
    def load(self):
        """Get the current value"""
        with self._lock:
            return self._value
    
    def store(self, value):
        """Set the counter to a specific value"""
        with self._lock:
            self._value = value


class SSLServer:
    """Multi-threaded SSL server for stress testing"""
    
    def __init__(self, host='localhost', port=8443, max_workers=50):
        self.host = host
        self.port = port
        self.max_workers = max_workers
        self.running = False
        self.server_socket = None
        self.executor = None
        self.stats = {
            'connections_accepted': ThreadSafeCounter(0),
            'connections_handled': ThreadSafeCounter(0),
            'errors': ThreadSafeCounter(0)
        }
        
    def create_ssl_context(self, certfile=None, keyfile=None):
        """Create SSL context with self-signed cert if none provided"""
        context = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
        
        if certfile and keyfile:
            context.load_cert_chain(certfile, keyfile)
        else:
            # Create self-signed certificate for testing
            self._create_self_signed_cert()
            context.load_cert_chain('server.crt', 'server.key')
            
        context.set_ciphers('HIGH:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!SRP:!CAMELLIA')
        return context
    
    def _create_self_signed_cert(self):
        """Create a self-signed certificate for testing"""
        if os.path.exists('server.crt') and os.path.exists('server.key'):
            return       
        # Fallback: use openssl command if cryptography not available
        os.system('openssl req -x509 -newkey rsa:2048 -keyout server.key -out server.crt -days 365 -nodes -subj "/C=US/ST=Test/L=Test/O=Test/CN=localhost"')
    
    def handle_client(self, client_socket, address):
        """Handle individual client connection with proper HTTP handling"""
        try:
            self.stats['connections_handled'].inc()
            client_socket.settimeout(30.0)  # Set socket timeout
            
            while True:
                try:
                    # Read HTTP request
                    data = client_socket.recv(4096)
                    if not data:
                        break
                    
                    # Parse HTTP request to check for Connection header
                    request = data.decode('utf-8', errors='ignore')
                    lines = request.split('\r\n')
                    
                    # Check if client wants to keep connection alive
                    keep_alive = False
                    content_length = 0
                    
                    for line in lines[1:]:  # Skip request line
                        if line.lower().startswith('connection:'):
                            if 'keep-alive' in line.lower():
                                keep_alive = True
                        elif line.lower().startswith('content-length:'):
                            try:
                                content_length = int(line.split(':', 1)[1].strip())
                            except:
                                pass
                        elif line == '':  # End of headers
                            break
                    
                    # Read request body if present
                    if content_length > 0:
                        body_data = b''
                        while len(body_data) < content_length:
                            chunk = client_socket.recv(min(4096, content_length - len(body_data)))
                            if not chunk:
                                break
                            body_data += chunk
                    
                    # Send HTTP response
                    response_body = b"OK"
                    response_headers = [
                        "HTTP/1.1 200 OK",
                        f"Content-Length: {len(response_body)}",
                        "Content-Type: text/plain",
                        "Server: SSL-Test-Server/1.0"
                    ]
                    
                    if keep_alive:
                        response_headers.append("Connection: keep-alive")
                        response_headers.append("Keep-Alive: timeout=30, max=100")
                    else:
                        response_headers.append("Connection: close")
                    
                    response = "\r\n".join(response_headers) + "\r\n\r\n"
                    client_socket.send(response.encode() + response_body)
                    
                    # If not keep-alive, break the loop
                    if not keep_alive:
                        break
                        
                except socket.timeout:
                    break
                except ssl.SSLWantReadError:
                    continue
                except ssl.SSLWantWriteError:
                    continue
                except (ssl.SSLError, ConnectionResetError, BrokenPipeError) as e:
                    # Common SSL/connection errors, not worth logging individually
                    break
                except Exception as e:
                    # Only log unexpected errors
                    if "EOF occurred in violation of protocol" not in str(e):
                        print(f"Unexpected error handling client {address}: {e}")
                    break
            
        except Exception as e:
            self.stats['errors'].inc()
            if "EOF occurred in violation of protocol" not in str(e):
                print(f"Error handling client {address}: {e}")
        finally:
            try:
                client_socket.shutdown(socket.SHUT_RDWR)
            except:
                pass
            try:
                client_socket.close()
            except:
                pass
    
    def start(self, certfile=None, keyfile=None):
        """Start the SSL server"""
        self.running = True
        
        # Create SSL context
        ssl_context = self.create_ssl_context(certfile, keyfile)
        
        # Enable SSL session reuse for better performance
        #ssl_context.set_session_id(b'ssl-test-server')
        ssl_context.session_stats()
        
        # Create server socket
        self.server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        self.server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
        self.server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
        
        # Set TCP_NODELAY to reduce latency
        self.server_socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
        
        self.server_socket.bind((self.host, self.port))
        self.server_socket.listen(1000)  # Increased backlog
        
        # Wrap with SSL
        self.server_socket = ssl_context.wrap_socket(
            self.server_socket, 
            server_side=True,
            do_handshake_on_connect=False  # Handle handshake manually for better error handling
        )
        
        print(f"SSL Server started on {self.host}:{self.port}")
        
        # Start thread pool
        self.executor = ThreadPoolExecutor(max_workers=self.max_workers)
        
        try:
            while self.running:
                try:
                    client_socket, address = self.server_socket.accept()
                    
                    # Perform SSL handshake
                    try:
                        client_socket.do_handshake()
                        self.stats['connections_accepted'].inc()
                        
                        # Submit to thread pool
                        self.executor.submit(self.handle_client, client_socket, address)
                        
                    except Exception as e:
                        # SSL handshake failed, close the connection
                        try:
                            client_socket.close()
                        except:
                            pass
                        continue
                        
                except socket.error as e:
                    if self.running:  # Only log if we're supposed to be running
                        if "Bad file descriptor" not in str(e):
                            print(f"Socket error: {e}")
                        
        except KeyboardInterrupt:
            print("\nShutting down server...")
        finally:
            self.stop()
    
    def stop(self):
        """Stop the server"""
        self.running = False
        if self.server_socket:
            try:
                self.server_socket.shutdown(socket.SHUT_RDWR)
            except:
                pass
            try:
                self.server_socket.close()
            except:
                pass
        if self.executor:
            self.executor.shutdown(wait=True)
    
    def get_stats(self):
        """Get current server statistics"""
        return {
            'connections_accepted': self.stats['connections_accepted'].load(),
            'connections_handled': self.stats['connections_handled'].load(),
            'errors': self.stats['errors'].load()
        }


def main():
    parser = argparse.ArgumentParser(description='SSL Socket Server Stress Test')
    parser.add_argument('--host', default='localhost', help='Host to bind/connect to')
    parser.add_argument('--port', type=int, default=8443, help='Port number')
    parser.add_argument('--connections', type=int, default=100, help='Number of connections to test')
    parser.add_argument('--concurrent', type=int, default=10, help='Concurrent connections')
    parser.add_argument('--workers', type=int, default=50, help='Server worker threads')
    parser.add_argument('--cert', help='SSL certificate file')
    parser.add_argument('--key', help='SSL private key file')
    parser.add_argument('--verify-ssl', action='store_true', help='Verify SSL certificates')
    parser.add_argument('--benchmark-type', choices=['single', 'concurrent'], 
                       default='single', help='Type of pyperf benchmark to run')
    
    args = parser.parse_args()
    

    server = SSLServer(args.host, args.port, args.workers)
    
    def signal_handler(sig, frame):
        print('\nStopping server...')
        server.stop()
        print(f"Final stats: {server.get_stats()}")
        sys.exit(0)
        
    signal.signal(signal.SIGINT, signal_handler)
    
    try:
        server.start(args.cert, args.key)
    except Exception as e:
        print(f"Server error: {e}")

if __name__ == '__main__':
    main()

main, GILful

$ wrk -d15s -t4 -c64 https://localhost:8443
Running 15s test @ https://localhost:8443
  4 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   226.20us   32.09us 758.00us   77.74%
    Req/Sec   558.89     51.37   646.00     57.12%
  33605 requests in 15.10s, 3.65MB read
Requests/sec:   2225.52
Transfer/sec:    247.76KB

main, free-threaded

Running 15s test @ https://localhost:8443
  4 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   171.38us   21.61us 740.00us   81.11%
    Req/Sec   643.79     33.56   770.00     79.47%
  38706 requests in 15.10s, 4.21MB read
Requests/sec:   2563.34
Transfer/sec:    285.37KB

This branch, GILful

Running 15s test @ https://localhost:8443
  4 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   222.78us   32.03us 718.00us   78.53%
    Req/Sec   559.22     57.84   646.00     70.76%
  33507 requests in 15.10s, 3.64MB read
Requests/sec:   2218.98
Transfer/sec:    247.03KB

This branch, free-threaded

Running 15s test @ https://localhost:8443
  4 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   175.70us   22.57us 642.00us   80.72%
    Req/Sec   629.46     47.47   727.00     88.37%
  37722 requests in 15.10s, 4.10MB read
Requests/sec:   2498.28
Transfer/sec:    278.13KB

Note that my desktop was a bit noisy so take these results with a chunk of salt. I think however all of these results are within margin of error, for both requests/sec and latency.

@python-cla-bot
Copy link

python-cla-bot bot commented Jul 25, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@gpshead gpshead added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes type-bug An unexpected behavior, bug, or error labels Jul 25, 2025
@gpshead gpshead dismissed kumaraditya303’s stale review July 25, 2025 06:05

resolved - the locks are required and are not a performance problem.

@gpshead gpshead added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 25, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @gpshead for commit 39323ab 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F134724%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 25, 2025
@gpshead gpshead merged commit e047a35 into python:main Jul 25, 2025
124 of 130 checks passed
@miss-islington-app
Copy link

Thanks @ZeroIntensity for the PR, and @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @ZeroIntensity and @gpshead, I could not cleanly backport this to 3.14 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker e047a35b23c1aa69ab8d5da56f36319cec4d36b8 3.14

@miss-islington-app
Copy link

Sorry, @ZeroIntensity and @gpshead, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker e047a35b23c1aa69ab8d5da56f36319cec4d36b8 3.13

gpshead added a commit to gpshead/cpython that referenced this pull request Jul 25, 2025
… in `ssl` (pythonGH-134724)

Lock when the thread state is detached.
(cherry picked from commit e047a35)

Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
@bedevere-app
Copy link

bedevere-app bot commented Jul 25, 2025

GH-137107 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Jul 25, 2025
@gpshead gpshead removed the needs backport to 3.13 bugs and security fixes label Jul 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-SSL type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy