Skip to content

sup.ai.audio

The sup.ai.audio package provides AI-powered audio understanding and interpretation capabilities. It is accessed through sup.ai.audio.

Example sup.ai.audio Usage
// Load an audio file
const audio = sup.audio("audio.mp3");
// Basic audio interpretation
const description = sup.ai.audio.interpret(audio);
console.log(description); // "A person speaking clearly about technology with background music..."
// Custom prompt for specific analysis
const instruments = sup.ai.audio.interpret(
audio,
"What musical instruments can you hear in this audio?"
);
// Analyze speech content
const transcript = sup.ai.audio.interpret(
audio,
"Please provide a detailed transcript of the speech in this audio."
);

Methods

sup.ai.audio.interpret()

(audio: SupAudio, prompt?: string) → string

const audio = sup.audio("audio.wav");
const description = sup.ai.audio.interpret(audio);

Converts audio to text using AI audio understanding. This method can transcribe speech, describe sound effects, identify musical instruments, and provide detailed audio analysis.

Parameters:

  • audio (SupAudio): The audio file to analyze
  • prompt (optional string): Custom instructions for the AI analysis. If not provided, uses a default prompt that provides both transcription and audio description.

Returns: A string containing the AI’s interpretation of the audio

Examples:

// Basic interpretation - transcribes speech and describes audio
const audio = sup.audio("recording.mp3");
const description = sup.ai.audio.interpret(audio);
// Custom analysis for music
const musicAnalysis = sup.ai.audio.interpret(
audio,
"Identify the musical genre, instruments, and mood of this audio."
);
// Focus on speech transcription
const transcript = sup.ai.audio.interpret(
audio,
"Please provide an accurate transcript of all spoken words in this audio."
);
// Identify sound effects
const soundEffects = sup.ai.audio.interpret(
audio,
"What sound effects or non-speech audio can you hear? Describe them in detail."
);

Notes

  • The AI model used for audio interpretation is Gemini 3 Flash, which supports both speech transcription and general audio understanding.
  • Audio files can be loaded using sup.audio() with URLs or file paths.
  • The default behavior (when no custom prompt is provided) includes both speech transcription and detailed audio description.
  • Supported audio formats include MP3, WAV, M4A, and other common audio formats.
  • For best results with speech transcription, use clear audio with minimal background noise.