NVIDIA RTX Spark laptop open on a studio desk with its OLED screen glowing in a dark room

NVIDIA RTX Spark: Inside the Agentic AI PC Built to Reinvent Windows

⏱️ 30-Second Verdict: NVIDIA RTX Spark is an agentic AI PC platform unveiled at COMPUTEX 2026 that pairs a Blackwell RTX GPU with a 20 core Arm based Grace CPU and 128GB of unified memory. It delivers up to 1 petaFLOP of FP4 AI performance, letting Windows run AI agents and large language models locally instead of in the cloud.

For forty years the personal computer has waited for you to click. NVIDIA RTX Spark is the company’s bet that the next forty years will work the other way around: you state a goal, and the machine does the work. Unveiled by Jensen Huang at COMPUTEX 2026, RTX Spark is not another faster chip bolted onto the same Windows. It is a ground up attempt to turn the PC into an agentic AI personal computer, one that runs reasoning models and autonomous agents on the device in your bag rather than in a distant data center.

Slim NVIDIA RTX Spark laptop sitting open on a minimalist desk beside a coffee cup in soft daylight

That ambition is why this launch matters beyond the spec sheet. NVIDIA is pairing its Blackwell GPU architecture with an Arm CPU and a single pool of memory, then handing the result to Microsoft to build a native Windows experience around. Whether that adds up to a genuine shift or a well marketed repackaging of ideas Apple and Qualcomm already shipped is the question worth digging into.

What Is NVIDIA RTX Spark?

RTX Spark is a superchip platform, not a single product. At its core sits a two chiplet system on a chip: one Blackwell RTX GPU chiplet with 6,144 CUDA cores, joined to a 20 core Arm based Grace CPU that NVIDIA co developed with MediaTek. The two halves talk over a silicon bridge NVIDIA rates at 600 GB/s, and they share 128GB of unified LPDDR5X memory built on TSMC’s 3nm process. NVIDIA quotes the package at 1 petaFLOP of AI performance.

The word that ties it together is agentic. A traditional Windows PC waits for input through a mouse and keyboard. An agentic machine accepts a natural language goal, then plans, calls tools, checks its own output and refines it, all without a human driving each step. RTX Spark is the hardware NVIDIA designed to make that loop fast enough to feel local rather than laggy.

AGENTIC AI PC

What is an agentic AI PC?

An agentic AI PC is a computer built to run autonomous AI agents on the device, so you describe an outcome and the system plans and executes the steps itself.

Goal in, work out
You give an instruction in plain language instead of clicking through menus and apps.
Local inference
128GB of unified memory holds the model in the machine, so prompts never leave the device.
Tool calling
The agent opens apps, queries files and chains actions, then evaluates whether the result is right.
Privacy by default
Because reasoning runs on device, sensitive documents stay off third party servers.

The shift is from an app you operate to an agent that operates the apps for you.

If you have followed NVIDIA’s broader push, this mirrors the same agent first thinking behind tools like the Google Gemini Spark AI agent, except NVIDIA is moving the entire model down onto the silicon instead of leaning on a cloud back end.

NVIDIA RTX Spark AI PC Specifications

RTX Spark arrives in two broad shapes: thin laptops and a desktop developer box. The laptops measure 14mm thick, weigh about 3 pounds, and come in 14 inch and 16 inch sizes with OLED panels, G-SYNC support and machined aluminum chassis, according to NVIDIA’s COMPUTEX presentation. The desktop side is anchored by the already shipping DGX Spark, which uses the GB10 Grace Blackwell Superchip.

Close up of the NVIDIA RTX Spark Blackwell superchip die resting on a dark reflective surface under studio lighting

Here is how the headline numbers compare, with the caveat that consumer laptop figures are manufacturer claims from the reveal rather than independently tested results.

Specification RTX Spark laptop DGX Spark desktop
GPU Blackwell RTX, 6,144 CUDA cores Blackwell RTX (GB10)
CPU 20 core Arm Grace (with MediaTek) 20 core Arm Grace
AI performance 1 petaFLOP FP4 (claimed) 1 petaFLOP FP4 (claimed)
Unified memory 128GB LPDDR5X 128GB LPDDR5X
Memory bandwidth up to 273 GB/s up to 273 GB/s
Local model size up to 120B parameters up to 200B parameters
Form factor 14mm, 3 lb laptop palm sized desktop

The most important line in that table is also the most argued over. NVIDIA markets a 600 GB/s figure. However, that number describes the internal NVLink C2C link between the chiplets, not the bandwidth to system memory. Independent reviews of DGX Spark peg usable memory bandwidth closer to 273 GB/s, and in practice that gap matters: as the community section below shows, it is the single biggest point of contention around this platform.

NVIDIA Blackwell RTX GPU FP4 Performance

The reason RTX Spark can claim a full petaFLOP from a laptop chip comes down to one feature: FP4. NVIDIA’s fifth generation Tensor Cores can process AI math in a 4 bit floating point format, which roughly doubles inference throughput compared with the FP8 precision most current models use, while cutting the memory each parameter consumes. According to NVIDIA’s own DGX Spark specifications, that FP4 path combined with 128GB of memory is what accelerates inference of state of the art models on a device this small.

The practical payoff is what you can actually load. NVIDIA says an RTX Spark machine can run a 120 billion parameter model such as Nemotron 3 Super locally, edit 12K 4:2:2 video, and render 3D scenes larger than 90GB. For context, loading a 120 billion parameter model is physically impossible on a 32GB consumer GPU like the RTX 5090, no matter how fast that card is. That capacity is genuinely impressive: capacity, not just raw speed, is the real unlock here, and unified memory is how NVIDIA gets it.

That said, FP4 is a trade. Aggressive 4 bit quantization can shave accuracy on some tasks, and not every open model ships an FP4 tuned variant. The petaFLOP headline is a peak figure under ideal precision, so treat it as a ceiling rather than a number you will hit in a typical agent workflow.

How RTX Spark Runs AI Agents Locally

Hardware is only half the story. The other half is the software stack NVIDIA and Microsoft built to keep agents responsive. NVIDIA describes an on device runtime called OpenShell for executing agent actions safely, paired with its Nemotron open models, while Microsoft contributed workload profile scheduling so Windows can feed the GPU efficiently from day one. In NVIDIA’s demo, a chain of document retrieval, semantic indexing, model summarization and an Outlook action completed in under two seconds.

LOCAL VS CLOUD AGENTS

How RTX Spark runs agents without the cloud

A cloud assistant ships your prompt to a remote server and waits. RTX Spark keeps every stage of the loop on the chip, which changes the economics of latency and privacy.

1. Resident model
A Nemotron class model lives in unified memory, so there is no cold start or upload step.
2. Plan and observe
The agent breaks the goal into steps and reads local files, email and app state for context.
3. Act through OpenShell
The OpenShell runtime executes tool calls in a sandbox so the agent can touch apps safely.
4. Refine and finish
It checks its own output and retries before handing you a result, all in roughly two seconds.

Keeping the full loop on device is what turns an agent from a novelty into something you can trust with private work.

This is the clearest difference between RTX Spark and a phone or laptop that simply calls a cloud assistant. Tools such as the Marvis AI assistant are useful, but they depend on a connection and a provider’s servers. An agent running entirely on RTX Spark keeps working on a plane, in a clinic, or inside a company that forbids sending data off site.

RTX Spark vs Apple Silicon for Local LLMs

Apple has quietly owned the local large language model conversation for two years, because its M series chips also use unified memory. So how does RTX Spark compare? The honest answer is that it depends on which half of inference you measure.

Developer workstation running a local AI agent on an NVIDIA RTX Spark desktop with code on a large monitor

Independent testing of DGX Spark against Apple’s Mac Studio is revealing. The Mac Studio M3 Ultra offers 512GB of unified memory at roughly 819 GB/s, about three times the Spark’s bandwidth, which makes it faster at the token generation, or decode, stage. But in distributed inference tests run by EXO Labs, DGX Spark was about 3.8 times faster than the M3 Ultra at the compute heavy prefill stage, while the Mac was about 3.4 times faster at decode. In practice, that means RTX Spark outperforms Apple on prompt processing but lags behind on the token generation you watch scroll by. Used together, the two delivered a 2.8 times speedup over the Mac alone. Compared to either machine on its own, that is the real lesson: NVIDIA wins on raw compute, Apple wins on memory throughput, and neither is a clean knockout.

Against AMD’s Strix Halo, the gap is wider. Signal65’s first look found that platform managed only about 4.6 tokens per second on a 70 billion parameter model with 4 bit quantization, and took a painful 78 seconds to produce its first token. RTX Spark’s value proposition is sitting between the two: more AI compute than Apple, far more polish and software support than AMD’s early Arc and Strix efforts.

What Real Users and Developers Report

The marketing is glossy, but the developer community that already received DGX Spark hardware has been refreshingly blunt, and their feedback is the best preview of what RTX Spark buyers should expect. The recurring praise is exactly what NVIDIA promised: one owner on Reddit pushed back on the bandwidth criticism, noting that the 128GB of unified memory lets you comfortably host several mid sized models at once, and that the cost to capability ratio looks strong when you stop fixating on a single number.

The recurring complaint is just as consistent. Multiple r/LocalLLaMA threads flag the memory bandwidth ceiling as a real bottleneck for token generation, and one widely shared post warned that under sustained load some units overheat and restart, a genuinely disappointing result for a premium machine. Another buyer described a launch firmware update delivered through NVIDIA Sync that nearly bricked the device, a known issue NVIDIA later addressed. None of this is fatal, but it is a reminder that a first generation platform ships with first generation rough edges, and the consumer RTX Spark laptops will be NVIDIA’s first serious attempt at Windows on Arm at scale.

Hype or a Genuine Shift in Personal Computing?

So is RTX Spark a real inflection point or a keynote flourish? Both readings have evidence. The skeptical case is strong: unified memory AI machines are not new, the bandwidth marketing is misleading, and Windows on Arm has a long history of broken promises around app compatibility and battery life.

The optimistic case, however, is harder to dismiss. What is genuinely different here is not any one specification but the alignment behind it. NVIDIA supplies the silicon and the models, Microsoft is rebuilding Windows scheduling around agents, and more than 30 laptops are committed from every major OEM for fall 2026. That is the kind of coordinated platform push that occasionally does reset a category, the way the smartphone did. Jensen Huang framed the company’s evolution from a GPU supplier into an AI infrastructure company, and RTX Spark is the consumer face of that strategy. The likeliest outcome is neither pure hype nor instant revolution: a strong, flawed first generation that proves the concept and a second generation that makes it mainstream.

What Comes Next: The RTX Spark Roadmap

NVIDIA was unusually candid about the future, outlining three chip generations rather than just selling the present one. The current Blackwell based RTX Spark is generation one. After it comes a Rubin based platform using faster LPDDR6 memory, which should directly attack the bandwidth weakness owners are complaining about today, followed by a generation NVIDIA referred to as Rosa Feynman.

Lineup of thin NVIDIA RTX Spark laptops from several PC brands displayed on a dark gradient studio background

That roadmap connects to NVIDIA’s larger Vera Rubin platform on the data center side, the same architecture powering its next generation AI factories. The strategic message is that the agent running on your laptop and the cluster training the model in the cloud will increasingly speak the same CUDA language. For buyers, the practical takeaway is simpler: RTX Spark is the start of a multi year line, so the honest move is to judge the first generation on what it does well today, local model capacity and a credible agentic Windows, while knowing the bandwidth gap is already on NVIDIA’s fix list. You can track availability on the official NVIDIA DGX Spark product page, and the full COMPUTEX reveal is worth watching through TechRadar’s keynote coverage and Tom’s Guide’s COMPUTEX 2026 live blog.

The Bottom Line

NVIDIA RTX Spark is the most serious attempt yet to make local AI the default rather than a hobbyist project. Its 128GB of unified memory and FP4 acceleration genuinely let you run models that simply do not fit on a normal gaming GPU, and the agentic Windows vision is more concrete than any rival’s. It is also a first generation product with a real bandwidth limitation and the usual launch bugs. If you develop AI agents or want serious local inference in a portable machine, RTX Spark is the platform to watch closely this fall. If you mainly want a fast everyday laptop, wait for the second generation, and let the early adopters find the rough edges first.

✅ Pros:

  • 128GB of unified memory runs language models up to 120 billion parameters fully on device
  • 1 petaFLOP of FP4 AI performance in a 14mm laptop or a palm sized desktop
  • Agentic Windows experience keeps prompts, files and reasoning local for privacy and low latency
  • Backed by every major PC maker, from Dell and HP to Microsoft Surface and Lenovo
❌ Cons:

  • Memory bandwidth near 273 GB/s trails Apple Silicon on raw token generation speed
  • Early DGX Spark units showed thermal throttling and a buggy launch firmware update
  • Windows on Arm app and driver compatibility is still maturing
  • Consumer pricing for the fall 2026 laptops has not been confirmed

Frequently Asked Questions

What is the difference between NVIDIA RTX Spark and DGX Spark?

DGX Spark is the shipping desktop developer box built on the GB10 Grace Blackwell Superchip, aimed at AI engineers who want a personal supercomputer. RTX Spark is the consumer facing version of that idea: the same Blackwell plus Arm formula tuned for mainstream Windows laptops and desktops arriving in fall 2026. They share the 128GB unified memory design and the 1 petaFLOP FP4 ceiling.

How much will NVIDIA RTX Spark cost?

NVIDIA has not published consumer pricing for RTX Spark laptops as of the COMPUTEX 2026 reveal. For reference, the developer focused DGX Spark desktop lists around 3,999 US dollars on retailers such as Amazon, so expect premium pricing on the first wave of laptops rather than budget Arm machines.

Can RTX Spark run large language models offline?

Yes. The whole point of the platform is local inference. With 128GB of unified memory, an RTX Spark machine can load open models up to roughly 120 billion parameters, including NVIDIA Nemotron 3 Super, without sending a single token to the cloud. That is the headline advantage over thin clients that depend on a remote API.

Which NVIDIA GPUs support FP4?

FP4 acceleration lives in NVIDIA fifth generation Tensor Cores, which means the Blackwell family: the data center GB200 and B200, the GeForce RTX 50 series, the GB10 in DGX Spark, and now the RTX Spark superchip. FP4 is a 4 bit floating point format that doubles throughput versus FP8 for compatible inference workloads.

Is the NVIDIA RTX Spark worth buying for gaming?

It can game well. NVIDIA claims AAA titles at 1440p above 100fps using DLSS 4.5 and Frame Generation, and the laptops ship with G-SYNC OLED panels. That said, RTX Spark is engineered first as an AI machine. A dedicated GeForce RTX gaming laptop will likely still offer more raw raster performance per dollar.

When will RTX Spark laptops be available?

NVIDIA says more than 30 laptops and 10 desktops will arrive in fall 2026 from Acer, ASUS, Dell, GIGABYTE, HP, Lenovo, Microsoft and MSI. Developer DGX Spark hardware is already shipping, so the underlying silicon is proven before the consumer wave lands.

Scroll to Top