For forty years the personal computer has waited for you to click. NVIDIA RTX Spark is the company’s bet that the next forty years will work the other way around: you state a goal, and the machine does the work. Unveiled by Jensen Huang at COMPUTEX 2026, RTX Spark is not another faster chip bolted onto the same Windows. It is a ground up attempt to turn the PC into an agentic AI personal computer, one that runs reasoning models and autonomous agents on the device in your bag rather than in a distant data center.

That ambition is why this launch matters beyond the spec sheet. NVIDIA is pairing its Blackwell GPU architecture with an Arm CPU and a single pool of memory, then handing the result to Microsoft to build a native Windows experience around. Whether that adds up to a genuine shift or a well marketed repackaging of ideas Apple and Qualcomm already shipped is the question worth digging into.
What Is NVIDIA RTX Spark?
RTX Spark is a superchip platform, not a single product. At its core sits a two chiplet system on a chip: one Blackwell RTX GPU chiplet with 6,144 CUDA cores, joined to a 20 core Arm based Grace CPU that NVIDIA co developed with MediaTek. The two halves talk over a silicon bridge NVIDIA rates at 600 GB/s, and they share 128GB of unified LPDDR5X memory built on TSMC’s 3nm process. NVIDIA quotes the package at 1 petaFLOP of AI performance.
The word that ties it together is agentic. A traditional Windows PC waits for input through a mouse and keyboard. An agentic machine accepts a natural language goal, then plans, calls tools, checks its own output and refines it, all without a human driving each step. RTX Spark is the hardware NVIDIA designed to make that loop fast enough to feel local rather than laggy.
What is an agentic AI PC?
An agentic AI PC is a computer built to run autonomous AI agents on the device, so you describe an outcome and the system plans and executes the steps itself.
You give an instruction in plain language instead of clicking through menus and apps.
128GB of unified memory holds the model in the machine, so prompts never leave the device.
The agent opens apps, queries files and chains actions, then evaluates whether the result is right.
Because reasoning runs on device, sensitive documents stay off third party servers.
If you have followed NVIDIA’s broader push, this mirrors the same agent first thinking behind tools like the Google Gemini Spark AI agent, except NVIDIA is moving the entire model down onto the silicon instead of leaning on a cloud back end.
NVIDIA RTX Spark AI PC Specifications
RTX Spark arrives in two broad shapes: thin laptops and a desktop developer box. The laptops measure 14mm thick, weigh about 3 pounds, and come in 14 inch and 16 inch sizes with OLED panels, G-SYNC support and machined aluminum chassis, according to NVIDIA’s COMPUTEX presentation. The desktop side is anchored by the already shipping DGX Spark, which uses the GB10 Grace Blackwell Superchip.

Here is how the headline numbers compare, with the caveat that consumer laptop figures are manufacturer claims from the reveal rather than independently tested results.
| Specification | RTX Spark laptop | DGX Spark desktop |
|---|---|---|
| GPU | Blackwell RTX, 6,144 CUDA cores | Blackwell RTX (GB10) |
| CPU | 20 core Arm Grace (with MediaTek) | 20 core Arm Grace |
| AI performance | 1 petaFLOP FP4 (claimed) | 1 petaFLOP FP4 (claimed) |
| Unified memory | 128GB LPDDR5X | 128GB LPDDR5X |
| Memory bandwidth | up to 273 GB/s | up to 273 GB/s |
| Local model size | up to 120B parameters | up to 200B parameters |
| Form factor | 14mm, 3 lb laptop | palm sized desktop |
The most important line in that table is also the most argued over. NVIDIA markets a 600 GB/s figure. However, that number describes the internal NVLink C2C link between the chiplets, not the bandwidth to system memory. Independent reviews of DGX Spark peg usable memory bandwidth closer to 273 GB/s, and in practice that gap matters: as the community section below shows, it is the single biggest point of contention around this platform.
NVIDIA Blackwell RTX GPU FP4 Performance
The reason RTX Spark can claim a full petaFLOP from a laptop chip comes down to one feature: FP4. NVIDIA’s fifth generation Tensor Cores can process AI math in a 4 bit floating point format, which roughly doubles inference throughput compared with the FP8 precision most current models use, while cutting the memory each parameter consumes. According to NVIDIA’s own DGX Spark specifications, that FP4 path combined with 128GB of memory is what accelerates inference of state of the art models on a device this small.
The practical payoff is what you can actually load. NVIDIA says an RTX Spark machine can run a 120 billion parameter model such as Nemotron 3 Super locally, edit 12K 4:2:2 video, and render 3D scenes larger than 90GB. For context, loading a 120 billion parameter model is physically impossible on a 32GB consumer GPU like the RTX 5090, no matter how fast that card is. That capacity is genuinely impressive: capacity, not just raw speed, is the real unlock here, and unified memory is how NVIDIA gets it.
That said, FP4 is a trade. Aggressive 4 bit quantization can shave accuracy on some tasks, and not every open model ships an FP4 tuned variant. The petaFLOP headline is a peak figure under ideal precision, so treat it as a ceiling rather than a number you will hit in a typical agent workflow.
How RTX Spark Runs AI Agents Locally
Hardware is only half the story. The other half is the software stack NVIDIA and Microsoft built to keep agents responsive. NVIDIA describes an on device runtime called OpenShell for executing agent actions safely, paired with its Nemotron open models, while Microsoft contributed workload profile scheduling so Windows can feed the GPU efficiently from day one. In NVIDIA’s demo, a chain of document retrieval, semantic indexing, model summarization and an Outlook action completed in under two seconds.
How RTX Spark runs agents without the cloud
A cloud assistant ships your prompt to a remote server and waits. RTX Spark keeps every stage of the loop on the chip, which changes the economics of latency and privacy.
A Nemotron class model lives in unified memory, so there is no cold start or upload step.
The agent breaks the goal into steps and reads local files, email and app state for context.
The OpenShell runtime executes tool calls in a sandbox so the agent can touch apps safely.
It checks its own output and retries before handing you a result, all in roughly two seconds.
This is the clearest difference between RTX Spark and a phone or laptop that simply calls a cloud assistant. Tools such as the Marvis AI assistant are useful, but they depend on a connection and a provider’s servers. An agent running entirely on RTX Spark keeps working on a plane, in a clinic, or inside a company that forbids sending data off site.
RTX Spark vs Apple Silicon for Local LLMs
Apple has quietly owned the local large language model conversation for two years, because its M series chips also use unified memory. So how does RTX Spark compare? The honest answer is that it depends on which half of inference you measure.

Independent testing of DGX Spark against Apple’s Mac Studio is revealing. The Mac Studio M3 Ultra offers 512GB of unified memory at roughly 819 GB/s, about three times the Spark’s bandwidth, which makes it faster at the token generation, or decode, stage. But in distributed inference tests run by EXO Labs, DGX Spark was about 3.8 times faster than the M3 Ultra at the compute heavy prefill stage, while the Mac was about 3.4 times faster at decode. In practice, that means RTX Spark outperforms Apple on prompt processing but lags behind on the token generation you watch scroll by. Used together, the two delivered a 2.8 times speedup over the Mac alone. Compared to either machine on its own, that is the real lesson: NVIDIA wins on raw compute, Apple wins on memory throughput, and neither is a clean knockout.
Against AMD’s Strix Halo, the gap is wider. Signal65’s first look found that platform managed only about 4.6 tokens per second on a 70 billion parameter model with 4 bit quantization, and took a painful 78 seconds to produce its first token. RTX Spark’s value proposition is sitting between the two: more AI compute than Apple, far more polish and software support than AMD’s early Arc and Strix efforts.
What Real Users and Developers Report
The marketing is glossy, but the developer community that already received DGX Spark hardware has been refreshingly blunt, and their feedback is the best preview of what RTX Spark buyers should expect. The recurring praise is exactly what NVIDIA promised: one owner on Reddit pushed back on the bandwidth criticism, noting that the 128GB of unified memory lets you comfortably host several mid sized models at once, and that the cost to capability ratio looks strong when you stop fixating on a single number.
The recurring complaint is just as consistent. Multiple r/LocalLLaMA threads flag the memory bandwidth ceiling as a real bottleneck for token generation, and one widely shared post warned that under sustained load some units overheat and restart, a genuinely disappointing result for a premium machine. Another buyer described a launch firmware update delivered through NVIDIA Sync that nearly bricked the device, a known issue NVIDIA later addressed. None of this is fatal, but it is a reminder that a first generation platform ships with first generation rough edges, and the consumer RTX Spark laptops will be NVIDIA’s first serious attempt at Windows on Arm at scale.
Hype or a Genuine Shift in Personal Computing?
So is RTX Spark a real inflection point or a keynote flourish? Both readings have evidence. The skeptical case is strong: unified memory AI machines are not new, the bandwidth marketing is misleading, and Windows on Arm has a long history of broken promises around app compatibility and battery life.
The optimistic case, however, is harder to dismiss. What is genuinely different here is not any one specification but the alignment behind it. NVIDIA supplies the silicon and the models, Microsoft is rebuilding Windows scheduling around agents, and more than 30 laptops are committed from every major OEM for fall 2026. That is the kind of coordinated platform push that occasionally does reset a category, the way the smartphone did. Jensen Huang framed the company’s evolution from a GPU supplier into an AI infrastructure company, and RTX Spark is the consumer face of that strategy. The likeliest outcome is neither pure hype nor instant revolution: a strong, flawed first generation that proves the concept and a second generation that makes it mainstream.
What Comes Next: The RTX Spark Roadmap
NVIDIA was unusually candid about the future, outlining three chip generations rather than just selling the present one. The current Blackwell based RTX Spark is generation one. After it comes a Rubin based platform using faster LPDDR6 memory, which should directly attack the bandwidth weakness owners are complaining about today, followed by a generation NVIDIA referred to as Rosa Feynman.

That roadmap connects to NVIDIA’s larger Vera Rubin platform on the data center side, the same architecture powering its next generation AI factories. The strategic message is that the agent running on your laptop and the cluster training the model in the cloud will increasingly speak the same CUDA language. For buyers, the practical takeaway is simpler: RTX Spark is the start of a multi year line, so the honest move is to judge the first generation on what it does well today, local model capacity and a credible agentic Windows, while knowing the bandwidth gap is already on NVIDIA’s fix list. You can track availability on the official NVIDIA DGX Spark product page, and the full COMPUTEX reveal is worth watching through TechRadar’s keynote coverage and Tom’s Guide’s COMPUTEX 2026 live blog.
The Bottom Line
NVIDIA RTX Spark is the most serious attempt yet to make local AI the default rather than a hobbyist project. Its 128GB of unified memory and FP4 acceleration genuinely let you run models that simply do not fit on a normal gaming GPU, and the agentic Windows vision is more concrete than any rival’s. It is also a first generation product with a real bandwidth limitation and the usual launch bugs. If you develop AI agents or want serious local inference in a portable machine, RTX Spark is the platform to watch closely this fall. If you mainly want a fast everyday laptop, wait for the second generation, and let the early adopters find the rough edges first.
- 128GB of unified memory runs language models up to 120 billion parameters fully on device
- 1 petaFLOP of FP4 AI performance in a 14mm laptop or a palm sized desktop
- Agentic Windows experience keeps prompts, files and reasoning local for privacy and low latency
- Backed by every major PC maker, from Dell and HP to Microsoft Surface and Lenovo
- Memory bandwidth near 273 GB/s trails Apple Silicon on raw token generation speed
- Early DGX Spark units showed thermal throttling and a buggy launch firmware update
- Windows on Arm app and driver compatibility is still maturing
- Consumer pricing for the fall 2026 laptops has not been confirmed
Frequently Asked Questions
What is the difference between NVIDIA RTX Spark and DGX Spark?
DGX Spark is the shipping desktop developer box built on the GB10 Grace Blackwell Superchip, aimed at AI engineers who want a personal supercomputer. RTX Spark is the consumer facing version of that idea: the same Blackwell plus Arm formula tuned for mainstream Windows laptops and desktops arriving in fall 2026. They share the 128GB unified memory design and the 1 petaFLOP FP4 ceiling.
How much will NVIDIA RTX Spark cost?
NVIDIA has not published consumer pricing for RTX Spark laptops as of the COMPUTEX 2026 reveal. For reference, the developer focused DGX Spark desktop lists around 3,999 US dollars on retailers such as Amazon, so expect premium pricing on the first wave of laptops rather than budget Arm machines.
Can RTX Spark run large language models offline?
Yes. The whole point of the platform is local inference. With 128GB of unified memory, an RTX Spark machine can load open models up to roughly 120 billion parameters, including NVIDIA Nemotron 3 Super, without sending a single token to the cloud. That is the headline advantage over thin clients that depend on a remote API.
Which NVIDIA GPUs support FP4?
FP4 acceleration lives in NVIDIA fifth generation Tensor Cores, which means the Blackwell family: the data center GB200 and B200, the GeForce RTX 50 series, the GB10 in DGX Spark, and now the RTX Spark superchip. FP4 is a 4 bit floating point format that doubles throughput versus FP8 for compatible inference workloads.
Is the NVIDIA RTX Spark worth buying for gaming?
It can game well. NVIDIA claims AAA titles at 1440p above 100fps using DLSS 4.5 and Frame Generation, and the laptops ship with G-SYNC OLED panels. That said, RTX Spark is engineered first as an AI machine. A dedicated GeForce RTX gaming laptop will likely still offer more raw raster performance per dollar.
When will RTX Spark laptops be available?
NVIDIA says more than 30 laptops and 10 desktops will arrive in fall 2026 from Acer, ASUS, Dell, GIGABYTE, HP, Lenovo, Microsoft and MSI. Developer DGX Spark hardware is already shipping, so the underlying silicon is proven before the consumer wave lands.




