Integrating OpenAI API into Your Next.js App
25 April 2026 · by Yunmin Shin
Why Add OpenAI to a Next.js App?
OpenAI's API gives your web application access to large language models capable of generating text, summarizing content, answering questions, extracting data, and writing code. For Bangkok businesses, this opens up practical use cases: Thai/English translation, customer support automation, product description generation, and internal search tools.
Integrating OpenAI into Next.js is straightforward once you understand the key patterns around streaming, API key security, and cost control.
How Do You Set Up the OpenAI Client?
Install the official SDK:
npm install openai
Store your API key in an environment variable — never in client-side code:
OPENAI_API_KEY=sk-...
Create a singleton client in lib/openai.ts:
import OpenAI from "openai";
export const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
Only import this file from server components, API routes, or server actions. The key must never be bundled into client JavaScript.
How Do You Stream Responses?
Streaming displays the model's response token by token rather than waiting for the full response. This dramatically improves perceived performance — users see output immediately rather than waiting 5–10 seconds for a complete response.
In a Next.js Route Handler, use the openai SDK's streaming mode and return a ReadableStream. On the client side, use the Vercel AI SDK (ai package) which provides React hooks (useChat, useCompletion) that handle the streaming response and update state automatically.
npm install ai
The useChat hook handles the request lifecycle, streaming state, and message history out of the box. This is the recommended pattern for any chat-style interface.
How Do You Control Costs?
OpenAI charges per token — input and output separately. Costs can grow quickly if you are not careful:
- Set
max_tokenson every completion request. Without a cap, a runaway prompt can generate thousands of tokens. - Cache responses for identical inputs. If many users ask the same question, return the cached answer rather than calling the API each time. Redis works well here.
- Use smaller models where sufficient. GPT-4o mini is significantly cheaper than GPT-4o and handles most text generation and classification tasks adequately.
- Log token usage per request and alert when daily spend exceeds a threshold. OpenAI's dashboard also has hard spending limits you should set immediately.
What Rate Limits Should You Know About?
New OpenAI accounts start with low rate limits on requests per minute (RPM) and tokens per minute (TPM). For production applications serving multiple concurrent users, request a rate limit increase through OpenAI's API console. Implement exponential backoff retry logic for 429 errors.
At Bluewich, we build OpenAI integrations for Bangkok businesses with streaming UI, Redis caching, and per-user rate limiting to prevent abuse. A well-integrated AI feature should feel instant and cost predictably.
Ready to Build Something Fast?
Get a free quote on LINE. We reply within 24 hours.
Ready to build something fast and scalable?
Get a free project quote on LINE. We reply within 24 hours.
무료 견적 on LINE