• src/sbbs3/websrvr.cpp

    From Rob Swindell (on Debian Linux)@1:103/705 to Git commit to main/sbbs/master on Wednesday, May 06, 2026 19:41:53
    https://gitlab.synchro.net/main/sbbs/-/commit/3ad3f0282bcff37fa1926121
    Modified Files:
    src/sbbs3/websrvr.cpp
    Log Message:
    websrvr: cast away two best-effort unchecked returns (CIDs 639932, 639941)

    CID 639932: remove(cleanup_file[i]) in close_request — best-effort
    cleanup of temporary request files; failure is benign.
    CID 639941: setsockopt(TCP_NODELAY) in http_session_thread — latency
    hint; failure is non-fatal. Also widen the bool nodelay to
    int so it has correct setsockopt() type.

    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell (on Debian Linux)@1:103/705 to Git commit to main/sbbs/master on Wednesday, May 06, 2026 22:36:57
    https://gitlab.synchro.net/main/sbbs/-/commit/6ad832522da440e614b8fcdf
    Modified Files:
    src/sbbs3/websrvr.cpp
    Log Message:
    websrvr: clamp tls_sent and explicit cast in sess_sendbuf return (CID 639935)

    The TLS path assigns 'result = tls_sent' where tls_sent is int and
    could theoretically be negative on cryptlib edge cases. Adding it
    to size_t 'sent' would underflow. Guard with 'if (result > 0)'.

    Also make the size_t-to-int returns explicit casts so Coverity sees
    the narrowing is intentional.

    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell (on Debian Linux)@1:103/705 to Git commit to main/sbbs/master on Wednesday, May 06, 2026 22:51:40
    https://gitlab.synchro.net/main/sbbs/-/commit/c7df44f17c494f7277ac112e
    Modified Files:
    src/sbbs3/websrvr.cpp
    Log Message:
    websrvr: skip getuserdat for anonymous sessions in http_logon

    Regression from 9e7649fe0: when http_logon is called with usr=NULL
    on an anonymous request (session->user.number == 0), getuserdat
    legitimately fails because user 0 doesn't exist, which now spams
    the log with '!ERROR reading user #0 data' on every anon hit.

    Only call getuserdat when there's an actual user number to read.

    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell (on Windows 11)@1:103/705 to Git commit to main/sbbs/master on Wednesday, May 06, 2026 23:04:20
    https://gitlab.synchro.net/main/sbbs/-/commit/c94f75aa58112c228a8cdce9
    Modified Files:
    src/sbbs3/websrvr.cpp
    Log Message:
    websrvr: include protocol, IP, request, and ARS in no-auth log

    The "!No authentication information" debug log line now reports the
    protocol, client address, request line, and the ARS string that triggered
    the auth requirement, so it's actionable when WEB_OPT_DEBUG_RX is on.

    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell (on Debian Linux)@1:103/705 to Git commit to main/sbbs/master on Saturday, May 09, 2026 14:04:17
    https://gitlab.synchro.net/main/sbbs/-/commit/f7b10a614935817ba8965ec1
    Modified Files:
    src/sbbs3/websrvr.cpp
    Log Message:
    websrvr: don't call destroy_session() with sentinel tls_sess value (-1)

    When TLS setup fails after add_private_key() returns an error, the code
    calls cryptDestroySession() directly and sets tls_sess = -1, then calls close_session_no_rb() which would pass -1 to destroy_session(), triggering
    a spurious "Destroying a session (-1) that's not in sess_list" error.

    Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell (on Windows 11)@1:103/705 to Git commit to main/sbbs/master on Thursday, June 04, 2026 09:44:21
    https://gitlab.synchro.net/main/sbbs/-/commit/50258e70bf63ba7a82af7515
    Modified Files:
    src/sbbs3/websrvr.cpp
    Log Message:
    websrvr: detect TLS client disconnect in session_check() (#1155)

    session_check()'s is_tls branch treated a readable socket as "connected"
    and latched session->tls_pending; once set, it returned "connected" on
    every later call without re-probing the socket. But a peer's TLS
    close_notify (and a FIN) arrive as readable bytes, so after an HTTPS
    client hung up, session_check() reported it connected forever. The
    JavaScript disconnect check in js_OperationCallback (ead5ccf16) relies on session_check(), so its abort never armed (offline_counter stayed 0): a badly-formed SSJS/XJS page that loops on mswait() without checking for disconnection (e.g. the webv4 user/system stats) ran forever, pinning its http_session thread, a MaxClients slot, and a CLOSE_WAIT socket -- a pile
    of zombie HTTPS clients in sbbsctrl/MQTT and eventual MaxClients
    exhaustion.

    Why this only bit Windows: socket_check() (xpdev) has two paths. On
    non-Windows builds it uses poll() (CFLAGS += -DPREFER_POLL, set only in build/Common.gmake, i.e. the GNU-make/Unix builds). poll() reports
    POLLHUP when the peer closes its end -- even while there is still buffered
    data to read -- and socket_check() returns false on POLLHUP before it
    ever runs the readable/MSG_PEEK logic. So on Unix the close was detected, session_check() returned false, and tls_pending never latched. Windows (MSBuild) does not define PREFER_POLL and uses select(), which has no
    POLLHUP equivalent: a closing TLS socket simply looks "readable"
    (MSG_PEEK returns the encrypted close_notify bytes), so the latch was set
    and the disconnect masked. The session_check() bug is platform-
    independent; poll()/POLLHUP merely hid it everywhere except Windows.

    Fix: drop the tls_pending liveness latch. Use peeked_valid (a decrypted
    byte already buffered) as the readable fast-path, and when the raw socket
    is readable, probe via cryptPopData(1 byte) -- which a raw MSG_PEEK
    cannot do -- to tell apart application data (connected; the byte is
    cached in session->peeked so the next sess_recv() returns it), CRYPT_ERROR_TIMEOUT (connected, no app data yet) and CRYPT_ERROR_COMPLETE
    (peer closed -> disconnected). The probe is non-blocking (CRYPT_OPTION_NET_READTIMEOUT == 0, set at session setup) and runs in the session's own thread, so there is no concurrent reader. Also close the
    socket in place in recvbufsocket() when session_check() reports a
    disconnect (it previously relied on the latch returning true and the
    following sess_recv() failing).

    Latch introduced in d93478b918 (famous-15-sons); the readable-as-
    connected + tls_pending set predates it (dbbfabf1b1, funky-27-foam).

    Validated on a production Windows server: CLOSE_WAIT count ~22 -> 0,
    sbbsctrl thread count 221 -> 25, and ran overnight with no zombie HTTPS clients.

    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell (on Windows 11)@1:103/705 to Git commit to main/sbbs/master on Sunday, June 21, 2026 20:54:25
    https://gitlab.synchro.net/main/sbbs/-/commit/65643e6ca604c3520da18e50
    Modified Files:
    src/sbbs3/websrvr.cpp
    Log Message:
    websrvr: bound drain_outbuf() so a dead client can't wedge the server

    drain_outbuf() spun in a SLEEP(1) loop as long as the outbuf ring buffer
    held data and the socket was still valid, with no timeout and no check of
    the terminate_server flag (the "/* ToDo: This should probably timeout eventually... */" note acknowledged this). When a client stops reading,
    the output thread blocks in its send and the buffer never drains, so the session thread spins forever. Under a distributed web scrape (many
    abandoned Alibaba/Aliyun keep-alive connections) this hung web-server
    shutdown: the "Waiting for N child threads to terminate" loop never
    completed because several http_session_thread()s were stuck in
    drain_outbuf() <- send_error().

    Bound the wait: return (not break) when terminate_server is set, or once
    the buffer has stalled for max_inactivity seconds. Returning rather than falling through matters - the output thread can hold outbuf_write while
    blocked in a send, so the trailing pthread_mutex_lock() would just re-hang; returning lets the caller close the socket, which unblocks the output
    thread.

    Unbounded since the original SLEEP-based drain in 00f254912d (maker-8-money).

    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell (on Debian Linux)@1:103/705 to Git commit to main/sbbs/master on Tuesday, June 23, 2026 13:40:10
    https://gitlab.synchro.net/main/sbbs/-/commit/f6d382c13949040c841d4465
    Modified Files:
    src/sbbs3/websrvr.cpp
    Log Message:
    websrvr: add debug-level timing probes to localize webv4 login stall (#1169)

    Issue #1169 reports an exactly-90-second stall on every webv4 portal login/logout, logged between "Initializing User Objects" and the first
    "Adding query value" line. It was initially suspected to be related to
    #1153 (Windows exclusive user.tab locking), but the reporter confirmed
    the stall persists on a current nightly that already carries the #1153
    fix, so it is unrelated.

    Tracing the path shows js_CreateUserObjects() and its area-object
    creators only build lazy JS skeletons and take no user.tab lock, and the stalling request is anonymous (no user-record write at all), so the
    native "Initializing User Objects" step is an unlikely culprit. To
    localize the delay empirically, add LOG_DEBUG probes that bisect the gap between that log line and query-string parsing:

    - http_checkuser(): "User Objects initialized" (bounds js_CreateUserObjects)
    - check_request(): "Authorization check complete" (bounds check_ars tail)
    - respond(): "Responding to request (dynamic=%d)"
    - exec_ssjs(): "beginning JS request" / "initializing request properties"
    (brackets JS_BEGINREQUEST to catch a blocking begin-request)

    The adjacent pair of lines that straddles the 90s gap in a debug log
    localizes the offending region. Probes are tagged "#1169 timing probe"
    for easy removal once root-caused.

    Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell (on Windows 11)@1:103/705 to Git commit to main/sbbs/master on Tuesday, June 23, 2026 22:21:39
    https://gitlab.synchro.net/main/sbbs/-/commit/b0f02c4e61aa835f2b9b9e21
    Modified Files:
    src/sbbs3/websrvr.cpp
    Log Message:
    websrvr: read buffered TLS request body directly (fix #1169 login stall)

    A webv4 login/logout is an HTTPS POST whose body (credentials) often arrives
    in the same TLS record as the headers, so it sits decrypted-but-unread in the TLS layer with nothing left on the raw socket. read_post_data() -> recvbufsocket() gated each read on session_check(), which since 50258e70b ("detect TLS client disconnect", #1155) only treats a TLS session as readable when a byte has been peeked (peeked_valid) - it no longer short-circuits on tls_pending. With the body buffered but no peeked byte, session_check() fell through to socket_check() on the raw socket and blocked for the full MaxInactivity timeout (60-90s) before the buffered body was finally read.
    That is the #1169 "login stalls ~90s at Initializing User Objects" symptom: POST-only (login/logout), duration == MaxInactivity, no wire traffic.

    Guard the recvbufsocket() wait with tls_pending the same way sockreadline() already does for header reads: when TLS data is already buffered, read it directly instead of waiting on the raw socket. Header reads were unaffected because sockreadline() kept its own tls_pending guard; only the body read regressed.

    Manifests whenever the body is TLS-buffered at read time (reliably on Windows, intermittently on Linux v3.22a); absent in v3.21f, which predates 50258e70b. Verified on vert: the auth POST's "Authorization check complete" -> "Responding to request" gap went from 60s to 0s.

    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell (on Windows 11)@1:103/705 to Git commit to main/sbbs/master on Tuesday, June 23, 2026 23:20:54
    https://gitlab.synchro.net/main/sbbs/-/commit/659de100d04037459107de30
    Modified Files:
    src/sbbs3/websrvr.cpp
    Log Message:
    websrvr: remove #1169 timing probes (issue resolved)

    Reverts the debug-level timing probes added in f6d382c13 to localize the
    webv4 login stall; #1169 is now root-caused and fixed in b0f02c4e6 (recvbufsocket reads buffered TLS data directly instead of waiting on the
    raw socket for MaxInactivity).

    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)