-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
gh-134698: Hold a lock when the thread state is detached in ssl
#134724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@Conobi Would you mind pointing me to the downstream FastAPI issue? |
Didn't find any. Howewer, here are the issues probably linked to this: |
FastAPI was having problems with this in their test suite, right? Is there someone I should CC? |
Modules/_ssl.c
Outdated
|
||
/* Make sure the SSL error state is initialized */ | ||
ERR_clear_error(); | ||
|
||
PySSL_BEGIN_ALLOW_THREADS | ||
Py_BEGIN_ALLOW_THREADS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use PySSL_BEGIN_ALLOW_THREADS(ssl->ctx)
here. The PySSLContext's ctx is being accessed by SSL_new so we should lock it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would probably be good to add a test creating ssl contexts in multiple threads as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing too special about this function, though. I'm not sure it's worth the effort to add a test for every single function fixed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see the improved thread safety! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the discussion on issue, I am not convinced that adding more locks is the solution, it will degrade performance in multi threaded "safe" workloads which work today.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Wait a minute, there shouldn't be any multithreaded workloads that get hit by this, right? They'll all crash right now. |
I asked Claude to generate a benchmark script that I tested with wrk, a web stress test tool The script sets up SSL socket connections and sends them to a thread to operate on requests, much like I would expect servers to do in multi-threaded production workloads. Script source
#!/usr/bin/env python3
import ssl
import socket
import argparse
import sys
import os
import threading
from concurrent.futures import ThreadPoolExecutor
import signal
class ThreadSafeCounter:
"""Thread-safe counter using threading.Lock"""
def __init__(self, initial_value=0):
self._value = initial_value
self._lock = threading.Lock()
def inc(self, amount=1):
"""Increment the counter by amount (default 1)"""
with self._lock:
self._value += amount
def load(self):
"""Get the current value"""
with self._lock:
return self._value
def store(self, value):
"""Set the counter to a specific value"""
with self._lock:
self._value = value
class SSLServer:
"""Multi-threaded SSL server for stress testing"""
def __init__(self, host='localhost', port=8443, max_workers=50):
self.host = host
self.port = port
self.max_workers = max_workers
self.running = False
self.server_socket = None
self.executor = None
self.stats = {
'connections_accepted': ThreadSafeCounter(0),
'connections_handled': ThreadSafeCounter(0),
'errors': ThreadSafeCounter(0)
}
def create_ssl_context(self, certfile=None, keyfile=None):
"""Create SSL context with self-signed cert if none provided"""
context = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
if certfile and keyfile:
context.load_cert_chain(certfile, keyfile)
else:
# Create self-signed certificate for testing
self._create_self_signed_cert()
context.load_cert_chain('server.crt', 'server.key')
context.set_ciphers('HIGH:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!SRP:!CAMELLIA')
return context
def _create_self_signed_cert(self):
"""Create a self-signed certificate for testing"""
if os.path.exists('server.crt') and os.path.exists('server.key'):
return
# Fallback: use openssl command if cryptography not available
os.system('openssl req -x509 -newkey rsa:2048 -keyout server.key -out server.crt -days 365 -nodes -subj "/C=US/ST=Test/L=Test/O=Test/CN=localhost"')
def handle_client(self, client_socket, address):
"""Handle individual client connection with proper HTTP handling"""
try:
self.stats['connections_handled'].inc()
client_socket.settimeout(30.0) # Set socket timeout
while True:
try:
# Read HTTP request
data = client_socket.recv(4096)
if not data:
break
# Parse HTTP request to check for Connection header
request = data.decode('utf-8', errors='ignore')
lines = request.split('\r\n')
# Check if client wants to keep connection alive
keep_alive = False
content_length = 0
for line in lines[1:]: # Skip request line
if line.lower().startswith('connection:'):
if 'keep-alive' in line.lower():
keep_alive = True
elif line.lower().startswith('content-length:'):
try:
content_length = int(line.split(':', 1)[1].strip())
except:
pass
elif line == '': # End of headers
break
# Read request body if present
if content_length > 0:
body_data = b''
while len(body_data) < content_length:
chunk = client_socket.recv(min(4096, content_length - len(body_data)))
if not chunk:
break
body_data += chunk
# Send HTTP response
response_body = b"OK"
response_headers = [
"HTTP/1.1 200 OK",
f"Content-Length: {len(response_body)}",
"Content-Type: text/plain",
"Server: SSL-Test-Server/1.0"
]
if keep_alive:
response_headers.append("Connection: keep-alive")
response_headers.append("Keep-Alive: timeout=30, max=100")
else:
response_headers.append("Connection: close")
response = "\r\n".join(response_headers) + "\r\n\r\n"
client_socket.send(response.encode() + response_body)
# If not keep-alive, break the loop
if not keep_alive:
break
except socket.timeout:
break
except ssl.SSLWantReadError:
continue
except ssl.SSLWantWriteError:
continue
except (ssl.SSLError, ConnectionResetError, BrokenPipeError) as e:
# Common SSL/connection errors, not worth logging individually
break
except Exception as e:
# Only log unexpected errors
if "EOF occurred in violation of protocol" not in str(e):
print(f"Unexpected error handling client {address}: {e}")
break
except Exception as e:
self.stats['errors'].inc()
if "EOF occurred in violation of protocol" not in str(e):
print(f"Error handling client {address}: {e}")
finally:
try:
client_socket.shutdown(socket.SHUT_RDWR)
except:
pass
try:
client_socket.close()
except:
pass
def start(self, certfile=None, keyfile=None):
"""Start the SSL server"""
self.running = True
# Create SSL context
ssl_context = self.create_ssl_context(certfile, keyfile)
# Enable SSL session reuse for better performance
#ssl_context.set_session_id(b'ssl-test-server')
ssl_context.session_stats()
# Create server socket
self.server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
self.server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
# Set TCP_NODELAY to reduce latency
self.server_socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
self.server_socket.bind((self.host, self.port))
self.server_socket.listen(1000) # Increased backlog
# Wrap with SSL
self.server_socket = ssl_context.wrap_socket(
self.server_socket,
server_side=True,
do_handshake_on_connect=False # Handle handshake manually for better error handling
)
print(f"SSL Server started on {self.host}:{self.port}")
# Start thread pool
self.executor = ThreadPoolExecutor(max_workers=self.max_workers)
try:
while self.running:
try:
client_socket, address = self.server_socket.accept()
# Perform SSL handshake
try:
client_socket.do_handshake()
self.stats['connections_accepted'].inc()
# Submit to thread pool
self.executor.submit(self.handle_client, client_socket, address)
except Exception as e:
# SSL handshake failed, close the connection
try:
client_socket.close()
except:
pass
continue
except socket.error as e:
if self.running: # Only log if we're supposed to be running
if "Bad file descriptor" not in str(e):
print(f"Socket error: {e}")
except KeyboardInterrupt:
print("\nShutting down server...")
finally:
self.stop()
def stop(self):
"""Stop the server"""
self.running = False
if self.server_socket:
try:
self.server_socket.shutdown(socket.SHUT_RDWR)
except:
pass
try:
self.server_socket.close()
except:
pass
if self.executor:
self.executor.shutdown(wait=True)
def get_stats(self):
"""Get current server statistics"""
return {
'connections_accepted': self.stats['connections_accepted'].load(),
'connections_handled': self.stats['connections_handled'].load(),
'errors': self.stats['errors'].load()
}
def main():
parser = argparse.ArgumentParser(description='SSL Socket Server Stress Test')
parser.add_argument('--host', default='localhost', help='Host to bind/connect to')
parser.add_argument('--port', type=int, default=8443, help='Port number')
parser.add_argument('--connections', type=int, default=100, help='Number of connections to test')
parser.add_argument('--concurrent', type=int, default=10, help='Concurrent connections')
parser.add_argument('--workers', type=int, default=50, help='Server worker threads')
parser.add_argument('--cert', help='SSL certificate file')
parser.add_argument('--key', help='SSL private key file')
parser.add_argument('--verify-ssl', action='store_true', help='Verify SSL certificates')
parser.add_argument('--benchmark-type', choices=['single', 'concurrent'],
default='single', help='Type of pyperf benchmark to run')
args = parser.parse_args()
server = SSLServer(args.host, args.port, args.workers)
def signal_handler(sig, frame):
print('\nStopping server...')
server.stop()
print(f"Final stats: {server.get_stats()}")
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
try:
server.start(args.cert, args.key)
except Exception as e:
print(f"Server error: {e}")
if __name__ == '__main__':
main() main, GILful
main, free-threaded
This branch, GILful
This branch, free-threaded
Note that my desktop was a bit noisy so take these results with a chunk of salt. I think however all of these results are within margin of error, for both requests/sec and latency. |
resolved - the locks are required and are not a performance problem.
🤖 New build scheduled with the buildbot fleet by @gpshead for commit 39323ab 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F134724%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
Thanks @ZeroIntensity for the PR, and @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14. |
Sorry, @ZeroIntensity and @gpshead, I could not cleanly backport this to
|
Sorry, @ZeroIntensity and @gpshead, I could not cleanly backport this to
|
… in `ssl` (pythonGH-134724) Lock when the thread state is detached. (cherry picked from commit e047a35) Co-authored-by: Peter Bierma <zintensitydev@gmail.com> Co-authored-by: Gregory P. Smith <greg@krypto.org>
GH-137107 is a backport of this pull request to the 3.14 branch. |
Inside of
Py_BEGIN_ALLOW_THREADS
blocks, OpenSSL calls need to be synchronized to prevent crashes. I'm doing this with a per-object mutex that is only held inside when the GIL or critical section is released.