OpenAI Open Models: Run GPT-OSS 12B Locally

OpenAI has released open-weight reasoning models you can run on your own machine. There are two models in this family, and I’m running one locally on a laptop right now. In this guide, I’ll explain what these models are, how they work, and how to install and use them in a simple, non-technical way.

I’ll show the web option for quick testing and three local options for full-speed use: Ollama, LM Studio, and direct downloads from Hugging Face. I’ll also share basic hardware notes from my setup so you can judge what will work on your system.

What is GPT-OSS?

GPT-OSS is a set of open-weight reasoning models released by OpenAI. The key idea is that you can download the model weights and run them locally, without sending prompts off your machine. The family currently includes a mid-size model suitable for modern desktops and higher-end laptops, and a much larger model aimed at high-end desktops with strong GPUs.

These models are built for reasoning tasks and include an option to show or hide intermediate reasoning steps in supported interfaces.

GPT-OSS Table Overview

GPT-OSS models

Model	Parameters	Intended hardware	Approx. download size	Notes
GPT-OSS 20B	20B	High-end laptops and desktops	~12–13 GB	Solid local performance on strong CPUs/Apple Silicon
GPT-OSS 120B	120B	High-end desktops with strong NVIDIA GPUs	Larger, multi-GB	Not practical on most laptops

Local runners

Option	OS support	Difficulty	Terminal needed	Auto-download models	Extras	Best for
Ollama	macOS, Windows, Linux	Easy	No	Yes	Web search toggle, “Turbo” mode (paid)	Fast start, simple chat interface
LM Studio	macOS, Windows	Moderate	Yes (CLI setup)	Yes (via CLI)	Model discovery browser	Users ok with a one-time CLI step
Hugging Face	All (via tooling)	Technical	Yes	Manual	Full control over files and formats	Experienced users

GPT-OSS Key Features

Open-weight, local-first setup: Download once and run entirely on your machine.
Reasoning models: Built to handle stepwise thinking for complex prompts.
Show or hide reasoning: Toggle visibility of intermediate reasoning in supported interfaces.
Multiple run options: Start quickly with Ollama or LM Studio, or use Hugging Face for a manual approach.
Cross-platform: macOS, Windows, and Linux are supported across the listed tools.

Model lineup and basic hardware notes

OpenAI lists two open models under the GPT-OSS name. The naming is unusual, but the main point is simple:

GPT-OSS 20B: Best pick for most users trying local inference today. I’m running this on a MacBook Pro with an Apple M3 Max and 64 GB of RAM. The model file for this setup was around 12–13 GB, so storage was no issue.
GPT-OSS 120B: Intended for high-end desktops with powerful NVIDIA GPUs. This one does not run well on a typical laptop.

If you have a recent desktop or a higher-end laptop, start with the 20B model. If you maintain a workstation with a strong GPU, the 120B model is an option worth exploring.

How GPT-OSS works?

GPT-OSS provides downloadable weights and a model definition that local runners can load for inference. In practice:

You choose a runner (Ollama or LM Studio for convenience, or Hugging Face for manual control).
The runner pulls the model weights and sets up a local chat interface.
Your prompts stay local. The model responds in the app, not in a browser tab tied to a remote server.
Some interfaces let you toggle a “show reasoning” option so you can see how the model arrives at an answer. You can hide it with a checkbox.

OpenAI also provides a simple web interface to try the models online. It’s helpful for a quick test, but it can be slow right after release due to demand. Running locally is much faster.

How to use GPT-OSS?

Below are step-by-step guides for the web demo and two local options. The order here mirrors how I tested them: web tryout, then Ollama for the fastest setup, then LM Studio for users who don’t mind a one-time CLI step. Hugging Face is listed last for experienced users who prefer full control.

Try it on the web

You can test GPT-OSS directly in the browser:

Visit the official GPT-OSS web interface (gptosss.com).
Enter a prompt to start a session.
Use the checkbox to show or hide reasoning.
You can choose among multiple reasoning model variants listed on the page.

Note: Immediately after release, the web interface may be slow due to heavy use. Local runs are significantly faster.

Run locally with Ollama (recommended for most)

Ollama has become much easier to use. You can install it, select GPT-OSS, and it will auto-download the model the first time you send a message.

Steps:

Download and install Ollama for macOS, Windows, or Linux from the official site.
Open the Ollama app. You’ll see a dropdown of available models, including GPT-OSS.
Select GPT-OSS 20B. The model is not installed yet at this point.
Type your first message and send it. Ollama will automatically download the model.
Once the download completes, the chat is ready for local inference.

Notes:

A web search option appears in the Ollama UI. Early on, it may not work well, and it requires a free Ollama account to activate. You can leave it off if you just want local inference.
Features are more limited than a full cloud assistant (no file uploads and fewer tools at launch).
There’s a “Turbo” setting in Ollama, but it requires a paid upgrade. This is an Ollama feature, not part of GPT-OSS itself.

Performance:

With GPT-OSS 20B on a strong laptop (Apple M3 Max, 64 GB RAM), responses felt almost instant for short prompts. This will vary by hardware.

Run locally with LM Studio

LM Studio also works well, but it adds a one-time CLI step to manage downloads.

Steps:

Download and install LM Studio for your OS.
Open LM Studio once so it can initialize.
Install the LM Studio CLI:
- macOS: Run the provided install command in Terminal.
- Windows: Run the provided install command in PowerShell.
Use the CLI to pull GPT-OSS 20B (LM Studio provides a model-specific command).
Reopen LM Studio, go to the Discover tab, and confirm the model appears at the top.
Open a chat session with GPT-OSS 20B and start prompting.

Notes:

The CLI step handles the model download and registration.
After setup, LM Studio works much like Ollama: select the model and chat.

Download directly from Hugging Face (technical)

If you prefer manual control:

Go to the model page on Hugging Face.
Download the weights and any required files.
Load the model using your preferred local inference stack.
This path requires comfort with terminal tools and model formats.

Web features vs local features

Web demo:
- Quick to try.
- May be slow under heavy load.
- Includes a reasoning visibility toggle.
Local (Ollama or LM Studio):
- Fast and private on your machine.
- Reasoning toggle supported in some interfaces.
- Feature set is narrower than a full cloud assistant (no file uploads, limited tools).
- Ollama includes a web search switch (early behavior may be inconsistent; free account required to enable).

Suggested pick: GPT-OSS 20B on a strong laptop or desktop

For most users, GPT-OSS 20B is the best starting point. It worked well on a high-end laptop with 64 GB of RAM. The download size of around 12–13 GB was manageable, and responses to short prompts were quick. If you have a powerful desktop GPU and want to experiment further, the larger model is available, but it’s not intended for typical laptops.

Step-by-step quickstart

Quickstart with Ollama

Install Ollama for your OS.
Open the app and select GPT-OSS 20B from the dropdown.
Type any prompt and press Enter.
Wait for the automatic download to complete.
Start chatting locally.
Optional: Toggle web search (requires a free Ollama account). Leave it off if you want pure local behavior.

Quickstart with LM Studio

Install LM Studio and launch it once.
Install the LM Studio CLI using the command provided in the app docs:
- macOS: Run the macOS command in Terminal.
- Windows: Run the Windows command in PowerShell.
Use the CLI to download GPT-OSS 20B.
Return to LM Studio, open Discover, and find GPT-OSS 20B at the top.
Start a chat session with the model.

Quickstart on the web

Visit the GPT-OSS web interface.
Choose a model variant.
Toggle reasoning display if needed.
Send a prompt and review the result.

FAQ

What hardware do I need to run GPT-OSS locally?

GPT-OSS 20B: A high-end desktop or laptop with plenty of RAM works well. I ran it on an Apple M3 Max system with 64 GB RAM. The download was about 12–13 GB.
GPT-OSS 120B: Aim for a powerful desktop with a high-end NVIDIA GPU. It’s not practical on most laptops.

Is the web demo fast?

It can be slow right after release due to heavy traffic. Local inference with Ollama or LM Studio is much faster on capable hardware.

Can I show or hide the model’s reasoning?

Yes. The web interface includes a checkbox to show or hide reasoning. Some local runners also expose a similar setting.

Do I need to use the terminal?

Ollama: No. It handles the model download and chat interface inside the app.
LM Studio: Yes, for a one-time CLI setup to pull the model. After that, you can stay in the app.
Hugging Face: Yes. This is a manual route for advanced users.

Does Ollama support web search?

There’s a web search toggle in Ollama. Early on, it may be unreliable and requires a free Ollama account to activate. You can leave it off for a pure local session.

What is “Turbo” in Ollama?

Turbo is an optional setting in Ollama that requires a paid upgrade. It’s not part of GPT-OSS itself.

Are features like file uploads supported?

Local runners are basic chat interfaces at launch. Expect fewer built-in tools than a full cloud assistant. You’ll mainly be entering text prompts and reading text responses.

Which model should I pick?

Use GPT-OSS 20B if you’re on a high-end laptop or a desktop without a top-tier GPU. Consider GPT-OSS 120B only if you have a powerful desktop GPU setup.

Can I download from Hugging Face instead?

Yes. You can pull the model weights from Hugging Face, but you’ll need to handle the tooling yourself. This path is best for experienced users.

Do I need an internet connection after download?

You need a connection to download the model and any updates. After that, you can run prompts locally without sending data off your machine, unless you enable features like web search.

Troubleshooting tips

Model not appearing in the app list:
- In Ollama, ensure you’re on a recent version and try searching for GPT-OSS again.
- In LM Studio, confirm you ran the CLI download command successfully before checking the Discover tab.
Slow responses:
- Disable any web features and keep everything local.
- Close other heavy apps and ensure you have enough free RAM.
- Consider a lighter quantization if available (within your chosen runner’s model list).
Download interruptions:
- Pause and resume the process in the app if supported.
- Check your disk space and network connection.
- For LM Studio CLI or Hugging Face, re-run the command; partial downloads often resume.

Practical notes from local testing

GPT-OSS 20B ran smoothly on a high-spec laptop with 64 GB RAM.
Short prompts returned results almost instantly.
The web search toggle in Ollama existed but was unreliable early on; I recommend leaving it off unless you need it.
Turbo in Ollama is behind a paid tier and is separate from the model’s core behavior.

Comparison with other local models

Both Ollama and LM Studio list other open models you can pull, such as Llama and others. GPT-OSS appears at the top of their lists when available. The steps to install and run are the same: select the model, trigger a download, and start a chat session. This consistency makes it easy to try GPT-OSS alongside other local models and compare behavior on the same hardware.

Privacy and control

Running GPT-OSS locally keeps prompts and responses on your machine. This is a major draw for local inference. If you enable web search or use the web demo, some data will travel over the network. For a purely local setup, stick to offline runners and disable network-dependent features.

Upgrading to the larger model

If you plan to test GPT-OSS 120B:

Use a desktop with a strong NVIDIA GPU and ample VRAM.
Expect longer downloads and heavier memory use.
Consider containerized or specialized inference stacks if you want more control over performance.

For most users, GPT-OSS 20B remains the best balance of speed, size, and setup effort.

Conclusion

OpenAI’s GPT-OSS models are now available as open weights, and you can run them locally with minimal setup. The 20B model works well on a high-end laptop or desktop and downloads quickly enough to be practical. The 120B model targets powerful desktop GPUs.

For a simple, fast start, use Ollama. If you don’t mind a one-time CLI step, LM Studio is also solid. The web demo is helpful for quick testing but may feel slow during peak times. Toggling reasoning visibility is supported, and most users will want to leave web search off for a clean local session.

Once installed, you’ll have a responsive, local reasoning model ready for everyday prompts. As tooling improves, expect smoother search integration and broader features inside these local apps.