Givon AI API models

One contract for every model: { type, model, input }. Each model has its own input schema, token price, and ready-to-use cURL, Python, and JS snippets. Generation runs asynchronously.

Get API key

Video

Gemini Omni Flashfrom 20 tokens

gemini-omni-videoGoogle

Google multimodal model that builds video from text, images, and video, and can edit an existing clip conversationally. Use it when you need to transform footage or mix inputs rather than generate a shot from scratch. Synchronized audio, up to 4K.

Grok Imagine Videofrom 1 tokens/s

grok-imagine-videoxAI

Fast short-form video with synchronized audio and strong prompt following. It can continue from the last frame, making scene stitching easier. 480p/720p.

Grok Imagine Video 1.5from 3 tokens/s

grok-imagine-video-1.5xAI

xAI image-to-video: animates a single source frame with synchronized audio and strong prompt following, with clips up to 15 seconds. Top-3 on the image-to-video arena.

HappyHorse 1.0from 6 tokens/s

happyhorse-1.0Alibaba

Alibaba's top video model: produces a clip with synchronized audio and speech in one pass. Use it for cinematic multi-scene videos with prepared voiceover, from text, a frame, references, or source-video edits. 720p/1080p.

HappyHorse 1.1from 5.5 tokens/s

happyhorse-1.1Alibaba

HappyHorse 1.1 is a video generation model from alibaba available through the Givon AI API. Use the published schema to prepare inputs and request token quotes before generation.

HeyGen Avatar IV3 tokens/s

heygen-photo-avatarHeyGen

Talking avatar from a single photo: the model reads vocal tone and rhythm, then builds lifelike expressions and hand gestures. Sync from text or an existing voiceover.

Kling 2.6from 3 tokens/s

kling-2.6Kling

Synchronized audio in a single pass: speech, ambience, and effects are generated directly in-frame without separate dubbing. Use it for budget clips and talking heads when multi-scene control is not needed.

Kling 2.6 Motion8 tokens/s

kling-2.6-motionKling

Affordable motion-control: transfers movement from a video reference to your character. Use it for simpler motion when 3.0-tier precision is not required.

Kling 3.0from 3 tokens/s

kling-3.0Kling

Kling flagship: up to 15 seconds and 4K, stable character identity across scenes, multi-scene direction, and synchronized multilingual audio.

Kling 3.0 Motion7 tokens/s

kling-3.0-motionKling

Transfers recorded movement, dance, or gestures from a video sample to your full-body character while locking face identity and capturing complex motion. Use it when choreography fidelity and appearance consistency matter.

Kling 3.0 Omnifrom 6.4 tokens/s

kling-3.0-omniKling

Multi-scene video with synchronized audio: transfers a character's appearance and voice from a video sample into new scenes, though audio must be disabled when that video sample is used. Use it for coherent narratives with one hero.

Kling 3.0 Turbofrom 5.5 tokens/s

kling-3.0-turboKling

Kling 3.0 Turbo is a video generation model from kling available through the Givon AI API. Use the published schema to prepare inputs and request token quotes before generation.

Kling Avatar 2.0from 6 tokens/s

kling-digital-humanKling

Animates a person from a photo and synchronizes speech, natural expressions, and gestures with a voiceover. Useful when you need a speaking or singing presenter from one portrait.

Kling Lip-Sync6 tokens/s

kling-lip-syncKling

Synchronizes lip movement in an existing video with a new audio track. Use it when the video is already shot and you only need dubbing, localization, or speech replacement.

Kling O1from 5.5 tokens/s

kling-o1Kling

Combines up to 7 angles of one subject through Elements and keeps its appearance strictly consistent through the entire clip. Use it for character turnarounds, recurring heroes, and product demos.

MiniMax H318 tokens/s

minimax-h3MiniMax

Native 2K video with stereo audio from text, a first and optional last frame, or image and audio references. Use it for expressive motion, legible in-frame text, and multimodal scenes up to 15 seconds.

Seedance 2.0from 4.2 tokens/s

seedance-2.0BytePlus

Follows director-style commands such as angle, camera motion, and shot changes through text, with audio generated in one pass. Use it for cinematic reference-guided shots up to 1080p.

Seedance 2.0 Fastfrom 3.5 tokens/s

seedance-2.0-fastBytePlus

The same cinematic feel and camera control, but noticeably faster for iterations and volume. Synchronized audio and references, up to 720p.

Seedance 2.0 Minifrom 1.5 tokens/s

seedance-2.0-miniBytePlus

Seedance 2.0 Mini is a video generation model from byteplus available through the Givon AI API. Use the published schema to prepare inputs and request token quotes before generation.

SwitchX Videofrom 9 tokens/s

switchx-videoBeeble

Changes the background, object, or lighting in existing footage from text, one reference, and an optional mask while preserving the subject, shape, motion, and expressions. Duration comes from the source video; output is 720p or 1080p.

Topaz Astrafrom 1 tokens

topaz-astraTopaz

Topaz Astra Creative enhances and upscales an existing GenAI video with diffusion-based detail. It is a source-video workflow, not prompt-to-video: upload a clip, choose output resolution and creativity, and Astra generates the improved video.

Veo 3.1from 14 tokens/s

veo-3.1Google

Google's flagship model for premium cinematic shots: up to 4K video with synchronized audio including dialogue, sound effects, and ambience out of the box. Up to 3 references keep character and style stable.

Veo 3.1 Fastfrom 2 tokens/s

veo-3.1-fastGoogle

The same sharpness up to 4K and synchronized audio as the flagship, but noticeably faster and cheaper. A workhorse for iterations and most production tasks.

Veo 3.1 Litefrom 3 tokens/s

veo-3.1-liteGoogle

The most affordable Veo tier: up to 1080p without 4K and audio that can be turned on or off. Use it for high-volume social content when 4K is unnecessary.

Wan 2.7 R2Vfrom 6 tokens/s

wan-2.7-r2vAlibaba

Uses up to 5 references, including images, video, or audio, to lock hero appearance and voice across scenes for episodic content with consistent characters.

Wan 2.7 Videofrom 6 tokens/s

wan-2.7-videoAlibaba

Video generation and editing in one engine: from text, from a photo, with a target final frame, or by editing an existing clip from a description. Up to 1080p.

Images

ChatGPT Images 2.0from 1 tokens

gpt-image-2OpenAI

OpenAI's flagship for complex images: it follows long instructions, multi-object composition, and multilingual in-frame text with high precision. Use it for infographics, slides, packaging, multilingual posters, and full-image edits in 1K, 2K, or 4K.

Grok Imagine2 tokens

grok-imaginexAI

Base xAI image tier: generate and edit full images from text without masks, and compose from several references. Use it for quick concepts and conversational edits when Pro-level precision is not required.

Grok Imagine Pro2 tokens

grok-imagine-proxAI

Higher tier of Grok Imagine: more detail, cleaner in-frame text, and stronger composition control from detailed prompts. Use it when the base tier is not sharp enough.

Nano Banana2 tokens

nano-bananaGoogle

Entry tier in Google's image family: the most affordable 1K image generation. Dialog editing and reference blending make it useful for volume work and quick drafts.

Nano Banana 2from 3 tokens

nano-banana-2Google

Google's versatile Flash tier: up to 4K, clean text, low latency, and reference consistency. Use it for rapid iteration and high-volume generation when you need strong output below Pro-tier cost.

Nano Banana 2 Lite2 tokens

nano-banana-2-liteGoogle

Nano Banana 2 Lite is a image generation model from google available through the Givon AI API. Use the published schema to prepare inputs and request token quotes before generation.

Nano Banana Profrom 2 tokens

nano-banana-proGoogle

Google's premium tier for complex brand scenes, with strong style-guide adherence, reference handling, lighting, and material rendering. Use it for polished product and portrait visuals, multi-object compositions, and 4K finals.

Seedream 4.5from 2 tokens

seedream-4.5BytePlus

Cinematic lighting and stable character identity across generations. Use it for product catalogs, character sheets, and reference-guided edits; a reliable workhorse with 2K/4K output and up to 14 references.

Seedream 5.0 Lite2 tokens

seedream-5BytePlus

Lightweight Seedream 5.0 Lite tier: reasons over complex prompts and can search the web, assembling multi-object scenes and topical visuals. Supports example-based reference edits and output up to 3K.

Seedream 5.0 Profrom 2 tokens

seedream-5-proBytePlus

Seedream for photorealistic commercial hero images, product photography, and edits guided by annotated references. It is strong on natural lighting, skin, and materials while following explicit art direction; 1K/2K output with up to 10 references.

SwitchX Imagefrom 6 tokens

switchx-imageBeeble

Beeble relighting and compositing: transfers an object, background, or light from text, one reference, and an optional mask onto the source photo with physically consistent lighting instead of generating from scratch. 720p and 1080p.

Wan 2.7 Image3 tokens

wan-2.7-imageAlibaba

Portrait-first image model: control facial features, makeup, and hairstyle through references. Use it for avatars, beauty assets, and consistent character series up to 2K.

Wan 2.7 Image Pro6 tokens

wan-2.7-image-proAlibaba

Wan's 4K tier with prompt reasoning: follows complex multi-step instructions and in-frame text more accurately, including tables and formulas. Use it for demanding deliverables such as posters and packaging.