Sonu Sahani logo
Sonusahani.com
DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5

DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5

0 views
10 min read
#AI
Table Of Content

    I previously compared DeepSeek v3.2 and Claude Opus 4.5 across two tests: writing a PRD and then building a space dashboard app from that PRD. Opus came out on top, and the version it built looked professional, nailed the brief, and only had one or two minor bugs. A viewer suggested testing DeepSeek again using the full Opus PRD instead of its own shorter one.

    Screenshot from DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 at 14s

    That made sense, so I gave DeepSeek another chance using the full Opus PRD. I also added a third model to the mix, GPT-5.1 Codex Max. Both DeepSeek and Codex received the exact same PRD, and I compared their builds against the Opus reference.

    Screenshot from DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 at 60s

    DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 - context

    In the first test, DeepSeek produced a very brief PRD with limited detail. Opus delivered a complete, production-ready PRD. That could be a big reason Opus performed better on the build.

    I wanted to see if a richer PRD would help DeepSeek build a better space dashboard. I also wanted to see how GPT-5.1 Codex Max handled the same brief. For a focused writing-tool comparison between Opus and GPT-5.1 in another domain, see this direct head-to-head on writing tasks.

    DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 - PRD we used

    The Opus PRD I used here was detailed. It included an executive summary, problem statement, goals and objectives, and target user information.

    It specified feature requirements, architecture design, UI and UX requirements, and technical specifications. It listed all required API endpoints and a recommended tech stack. Both models used the same stack: React 18 with TypeScript.

    It also included a development roadmap that broke the project into tasks. It covered risks and mitigations, future considerations, and an appendix. In short, a complete PRD built for production.

    For context on how a newer Codex model compares against Opus in a broader sense, check this Codex vs Opus analysis.

    DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 - setup

    I worked in Cursor with a split-screen setup. On the left, I connected a client to the DeepSeek v3.2 API. On the right, I used the Codex extension with GPT-5.1 Codex Max and set reasoning to extra high.

    Screenshot from DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 at 183s

    I pasted the full Opus PRD into both workspaces. I let each model handle the build process. I then ran both dashboards in dev mode for testing.

    Build times and stability

    Both models completed the build. Codex finished in 18 minutes.

    Screenshot from DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 at 225s

    DeepSeek took about 40 minutes and got stuck a few times. I had to cancel and prompt it to continue with the task.

    I am not sure if that was an API connection issue or a client issue. It did not happen in the first DeepSeek test. Either way, it took longer to complete the build this time.

    If you care about throughput and operating costs on newer variants, see these notes on speed, cost, and quality for GPT-5.2 vs Opus 4.5.

    DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 - DeepSeek v3.2 results

    Here is the DeepSeek v3.2 version of the space dashboard built from the Opus PRD. The UI and design look a bit better than before. The background, widgets, and title feel like an improvement.

    Some font and formatting could be improved. The font was too large in places and made text hard to read. That caused clear formatting issues that look like a CSS problem to fix.

    Screenshot from DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 at 267s

    The ISS tracker was still loading at the time of testing. The astronomy picture of the day widget broke the image. Upcoming events, data sources, and about sections were present.

    Comparing this to the original DeepSeek v3.2 dashboard built from the shorter DeepSeek PRD, there were no formatting issues in the earlier one. The layout was very similar across both versions, with the map and picture of the day at the top and three widgets below. The main difference was the new design and the formatting problems here.

    It might be that the detailed PRD contributed to the formatting problem. One possible workaround is to break the PRD into tasks. You could ask Opus to break it into modules, then ask DeepSeek to build module by module.

    DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 - GPT-5.1 Codex Max results

    Here is the Codex version of the space dashboard. It included a mission control section at the top showing the time, information about the crew, and the next launch. It also had the ISS live tracker with an interactive map, coordinates, and crew on board.

    The astronomy picture of the day widget appeared to have broken images. That could be an API or CORS issue to fix. The widget itself looked clean from a UI perspective.

    Screenshot from DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 at 285s

    Further down, there was a space events calendar, a near-Earth object monitor, and space weather. There was a clear formatting issue in the middle section where content was squeezed and text wrapped to one word per line. There was a lot of empty space caused by poor layout in that row.

    It did not nail how those three sections should work together. The ISS tracker section was solid. The mission control section at the top was also good.

    DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 - Opus 4.5 reference

    The original Opus 4.5 build remains the best of all. It nailed the UI design and the API integrations. All features worked and the result looked clean and modern.

    Screenshot from DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 at 512s

    Opus 4.5 again comes out on top when compared with the DeepSeek v3.2 version using the detailed PRD and the GPT-5.1 Codex Max build. That held true across both design quality and functional reliability.

    Screenshot from DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 at 522s

    For estimates on usage at scale, you can review this breakdown of token costs for GPT-5.2 vs Opus 4.5.

    DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 - comparison overview

    CriteriaDeepSeek v3.2GPT-5.1 Codex MaxOpus 4.5
    PRD usedFull Opus PRDFull Opus PRDOwn PRD from first test
    Build time~40 minutes~18 minutesN/A here - reference build
    Build stabilityGot stuck, needed continue promptsCompleted without interventionStable in prior test
    UI qualityImproved look, font too large, formatting issuesClean in parts, middle section layout brokenClean and modern throughout
    API integrationsISS loading, APOD image brokenISS solid, APOD broken, others presentAll features worked
    ISS trackerLoading state persistedGood with interactive map and coordinatesWorking
    APOD widgetImage brokenImages brokenWorking
    LayoutSimilar to earlier DeepSeek buildSimilar overall, middle row squeezedWell-balanced layout
    OverallExecuted the build with CSS issuesFast build, solid top sections, layout bugsBest overall quality

    For more three-way perspectives that include other families, this broader comparison with GLM is helpful: GLM 4.7 vs Opus 4.5 vs GPT-5.2.

    Use cases

    DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 - DeepSeek v3.2

    Good for teams that want to feed a rich PRD but may benefit from breaking work into modules. Suits scenarios where you can iterate on CSS and layout after the first pass. Works when you can guide it with stepwise prompts.

    DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 - GPT-5.1 Codex Max

    Useful for faster turnarounds and getting a working skeleton quickly. Strong for map-based sections and dashboard headers. Best when you can correct mid-page layout issues.

    DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 - Opus 4.5

    Best fit for production-ready dashboards with clean UI and reliable integrations. Strong choice when you need fewer fixes after the initial build. Ideal as a reference build for quality and completeness.

    Screenshot from DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 at 498s

    If you are comparing newer Codex variants to Opus on throughput and quality, here is another practical read on speed, cost, and quality trade-offs.

    Pros and cons

    DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 - DeepSeek v3.2

    Pros: Better look than the earlier DeepSeek build, executed the full brief, and included all sections. Cons: Slower, got stuck during build, font size and layout required CSS fixes.

    DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 - GPT-5.1 Codex Max

    Pros: Faster build, solid ISS tracker, and a strong top section. Cons: Middle layout broke with squeezed text and empty space, APOD images failed.

    DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 - Opus 4.5

    Pros: Best design quality, all features worked, and modern UI. Cons: None surfaced in this comparison.

    If you are weighing ongoing usage, see this breakdown on token costs for GPT-5.2 vs Opus 4.5 to plan budgets.

    Step-by-step - reproduce the test

    Open Cursor and set up a split-screen workspace.
    Connect the left workspace to the DeepSeek v3.2 API.
    Install and open the Codex extension on the right with GPT-5.1 Codex Max.

    Set reasoning to extra high in Codex.
    Paste the full Opus PRD into both sides.
    Prompt each model to build the space dashboard app.

    Wait for builds to complete.
    Switch to dev mode for each project.
    Test the ISS tracker, APOD, events, NEO, and space weather widgets.

    Note build times and any required prompts.
    Record formatting issues, broken images, and layout problems.
    Compare each build to the Opus 4.5 reference for quality.

    For more context on Codex vs Opus across versions, here is a focused look at Codex vs Opus.

    Quick CSS fix - font and layout

    If your text is oversized or rows are squeezing content, a small CSS pass helps. Adjust font sizing and grid behavior to reduce overflow. Here is a minimal example.

    Screenshot from DeepSeek V3.2 vs GPT-5.1 Codex MAX vs Opus 4.5 at 299s

    :root {
      --font-base: 14px;
      --font-scale: 1.1;
    }
    
    body {
      font-size: var(--font-base);
      line-height: 1.5;
      -webkit-font-smoothing: antialiased;
      -moz-osx-font-smoothing: grayscale;
    }
    
    .widget h2, .widget h3 {
      font-size: calc(var(--font-base) * var(--font-scale));
      margin: 0 0 8px;
      line-height: 1.2;
      word-break: keep-all;
      overflow-wrap: anywhere;
    }
    
    .dashboard-row {
      display: grid;
      grid-template-columns: repeat(3, minmax(0, 1fr));
      gap: 16px;
      align-items: start;
    }
    
    .widget {
      min-width: 0;
      overflow: hidden;
      padding: 12px;
      background: rgba(255, 255, 255, 0.06);
      border: 1px solid rgba(255, 255, 255, 0.1);
      border-radius: 10px;
    }
    
    .widget p {
      margin: 0 0 6px;
      font-size: 0.95rem;
    }
    
    /* Fix for one-word-per-line issues in tight columns */
    .widget .content {
      white-space: normal;
      word-break: break-word;
    }

    Final thoughts

    With the full Opus PRD, DeepSeek v3.2 delivered a better-looking dashboard than its earlier attempt, but ran into CSS and build stability issues. GPT-5.1 Codex Max built faster and shipped a solid header and ISS section, but stumbled on mid-page layout and broken APOD images. Opus 4.5 remained the clear winner on UI quality and working integrations.

    This test also suggests a practical tip for DeepSeek v3.2: break a large PRD into modules and build step by step. That can reduce stalls and formatting regressions and improve outcomes with the same input. For a broader lens that includes GLM, this three-way comparison is a good companion read: GLM 4.7 vs Opus 4.5 vs GPT-5.2.

    Subscribe to our newsletter

    Get the latest updates and articles directly in your inbox.

    sonuai.dev

    Sonu Sahani

    AI Engineer & Full Stack Developer. Passionate about building AI-powered solutions.

    Related Posts