sup.ai.audio
The sup.ai.audio package provides AI-powered audio understanding and interpretation capabilities. It is accessed through sup.ai.audio.
// Load an audio fileconst audio = sup.audio("audio.mp3");
// Basic audio interpretationconst description = sup.ai.audio.interpret(audio);console.log(description); // "A person speaking clearly about technology with background music..."
// Custom prompt for specific analysisconst instruments = sup.ai.audio.interpret( audio, "What musical instruments can you hear in this audio?");
// Analyze speech contentconst transcript = sup.ai.audio.interpret( audio, "Please provide a detailed transcript of the speech in this audio.");Methods
sup.ai.audio.interpret()
(audio: SupAudio, prompt?: string) → string
const audio = sup.audio("audio.wav");const description = sup.ai.audio.interpret(audio);Converts audio to text using AI audio understanding. This method can transcribe speech, describe sound effects, identify musical instruments, and provide detailed audio analysis.
Parameters:
audio(SupAudio): The audio file to analyzeprompt(optional string): Custom instructions for the AI analysis. If not provided, uses a default prompt that provides both transcription and audio description.
Returns: A string containing the AI’s interpretation of the audio
Examples:
// Basic interpretation - transcribes speech and describes audioconst audio = sup.audio("recording.mp3");const description = sup.ai.audio.interpret(audio);
// Custom analysis for musicconst musicAnalysis = sup.ai.audio.interpret( audio, "Identify the musical genre, instruments, and mood of this audio.");
// Focus on speech transcriptionconst transcript = sup.ai.audio.interpret( audio, "Please provide an accurate transcript of all spoken words in this audio.");
// Identify sound effectsconst soundEffects = sup.ai.audio.interpret( audio, "What sound effects or non-speech audio can you hear? Describe them in detail.");Notes
- The AI model used for audio interpretation is Gemini 3 Flash, which supports both speech transcription and general audio understanding.
- Audio files can be loaded using
sup.audio()with URLs or file paths. - The default behavior (when no custom prompt is provided) includes both speech transcription and detailed audio description.
- Supported audio formats include MP3, WAV, M4A, and other common audio formats.
- For best results with speech transcription, use clear audio with minimal background noise.