Speech-to-Text & Image Input – PurioChat Documentation

With multimodal features on, visitors can speak their questions and attach photos for the AI to look at. Both run through the AI provider you already configured, so there is nothing extra to sign up for.

Pro feature. Speech-to-Text and Image Input require PurioChat Pro. The toggles appear in the free build, but they only take effect once a valid Pro license is active.

You will find both options under PurioChat → Settings → General, in the Multimodal Features block.

PurioChat Settings General sub-section with the Multimodal Features block, highlighting the Enable Speech-to-Text and Enable Image Input toggles

Speech-to-Text (voice input)

Turn on Enable Speech-to-Text (off by default) to add a microphone button to the chat input. Visitors tap it, speak, and PurioChat sends the recording to your AI provider for transcription. The text drops into the chat box, where they can edit it or hit send.

This helps mobile users, anyone who finds typing awkward, and accessibility. The mic button has three states so people know what is happening: an idle icon, a recording state (red dot, timer, stop button), and a transcribing spinner while the text comes back.

How transcription is handled per provider

Voice transcription uses the same API key as your chat, routed to your selected provider:

Provider	Transcription engine
OpenAI	Whisper (`whisper-1`)
Mistral AI	Voxtral (`voxtral-mini-latest`)
OpenRouter	Routed via Google Gemini
Google Gemini (direct)	Not supported — use OpenAI, Mistral, or OpenRouter for voice

Heads up: With Google Gemini set directly, the microphone button will not transcribe. For voice input, switch to OpenAI, Mistral, or OpenRouter under PurioChat → Settings → API Configuration.

HTTPS is required

Browsers only allow microphone access on secure pages, so your site must be served over HTTPS for the mic button to work. On an http:// site the browser blocks microphone capture. Most hosts include free SSL, so this is usually already in place.

Privacy and limits

Audio is not stored on your server. The recording goes straight to the AI provider for transcription; PurioChat does not save it.
Maximum recording size is 3 MB, plenty for a normal spoken question.
Supported audio formats: WebM, MP4, MPEG, OGG, WAV, and M4A — the visitor’s browser picks one automatically.
Voice requests count against the same per-IP chat rate limits as typed messages.

Image Input (vision)

Turn on Enable Image Input (off by default) to add an attach button to the chat input. Visitors can then upload a photo and ask the AI about it — for example, “Which of your products looks like this?”. The image is sent alongside the question, and the AI responds based on what it sees.

Chat input bar showing the microphone button and the image-attach (paperclip) button next to the text field

Works with every provider

Image understanding works with all four AI providers (OpenAI, Google Gemini, Mistral AI, and OpenRouter) — it is not limited to a specific model. Once Image Input is on and a provider is configured, visitors can attach photos. One caveat: a very small or text-only model on OpenRouter may not read images, and PurioChat flags this with a warning in the model picker. Stick with a mainstream chat model to avoid it.

Privacy and limits

Images are not stored on your server. Like voice recordings, attached photos go to the AI provider for analysis; PurioChat does not keep them.
Accepted image types: JPEG, PNG, GIF, and WebP, up to 5 MB per image.

Tip: Both features run through your AI provider, so they consume API credits like any other chat message. On a free tier with tight limits, watch your usage after enabling them. See Choosing an AI Provider for guidance.

Speech-to-Text (voice input)

How transcription is handled per provider

HTTPS is required

Privacy and limits

Image Input (vision)

Works with every provider

Privacy and limits

Related Articles