Overview

The AI Studio in Corti's Console provides access to speech-to-text functionality through our web component interface. This guide walks through the features and configuration options available within Corti's Dictation web component.

Accessing AI Studio

Navigate to the Corti Console homepage
Choose Speech-to-Text from the available AI Studio options

Navigating the Speech to Text AI Studio

The Speech to Text AI Studio consists of two primary sections.

On the left side of the page, you may interact with Corti's speech to text web component and see the resulting test outputs, including the dictated text, spoken commands and all events associated with the session. The right side of the page contains configuration options and auto-generated code snippets for use within your application.

The guide, below, will break down each piece of the AI Studio experience.

Speech to Text Web Component

Found in the upper left corner of the page, the web component will be used to start and stop your speech to text session. Beginning a session will open a WebSocket to allow for streaming of live audio.

Important: Using the speech to text web component will consume credits within your Corti Project.

To Begin a Session

Click the blue microphone button in the web component
Wait for the connection to establish
Begin speaking to see text appear in the dictated text area

Using the settings tab, be sure your microphone and language choices are appropriately set before starting your speech to text session.

During a Session

Text will begin to appear within the 'Dictated Text' section of the page, just below the web component.

If spoken commands are detected, each individual command will appear within the 'Detected Commands' section of the page, just below the dictated text. Commands will appear the same number of times they are spoken.

Example: As shown below, if "next section" is spoken twice, "Next Section" will appear twice within the 'Detected Commands' section.

Using the Event Inspector

If you wish to monitor the events of the web component and dictation stream in real-time, expand the 'Event Inspector' at the bottom of the page. All events associated with the current session will be available for review and download.

After stopping dictation, all events from the session remain visible in the Event Inspector until it is closed.

Important Note: If the Event Inspector is closed/collapsed after a session is ended by clicking the red 'stop record' button, the logs of the session will be erased.

The Event Inspector supports filtering by event type:

Network
Events
Errors

Ending a Session

Click the red 'stop record' button to end the dictation session
This action finalizes the transcription and closes the WebSocket connection

Once a session has been ended the number of credit consumed during the session will be displayed in the lower right portion of the screen, in-line with the Event Inspector. The value will be displayed in USD.

Speech to Text Configuration Settings

The right side panel contains all configuration options for the Speech-to-Text web component.

Dictation Language Selection

Select the dictation language from the available language options in the dropdown menu. Languages are shown along with the language code accepted by Corti.

Web component Configurations

Color-scheme

Choose from one of three color scheme options:

System Default
Light Mode
Dark Mode

User Interface Options

Control which interface elements are visible to users.

Language Selector

Toggle on to display language selection in the options menu
Toggle off to hide language selection from users

Device Selector

Toggle on to display device selection in the options menu
Toggle off to hide device selection from users

When both options are disabled, no options menu appears next to the dictation button.

Punctuation Settings

Configure how punctuation is handled in transcriptions.

Spoken Punctuation

Toggle on to enable support for punctuation verbalized during dictation (e.g., "period", "comma", "new line") to be represented as punctuation instead of textual output.

Toggle off to disable spoken punctuation

Automatic Punctuation

Toggle on to enable support for the speech recognition system to automatically insert punctuation in text output (recommended for conversational transcription only)
Toggle off to disable automatic punctuation insertion

Spoken Commands

Corti supports voice commands during dictation sessions.

Default Commands

Corti provides three built-in commands which are available for testing:

NextSection
Delete
InsertTemplate

Custom Commands

Create custom spoken commands by clicking "+ Add Command"

Input the following values:

Command ID: A unique identifier for the command
Spoken Phrase: The exact phrase users will speak to trigger the command
Variables: Any variables to include within the spoken phrase (optional)

To add a variable

Insert the defined variable to the spoken phrase enclosed within curly brackets. (ex: Insert my {template_name} template)
- 'template_name' will be the value used in the variable 'Enum' field
Add the value used within your curly brackets to the Emum field
Provide the acceptable values for the variable within the following field

Example:

An example of using variables can always be seen in the 'insert_template' example command.

After entering the command details, click the "Add Command" button to make the command available for testing.

Implementing the Web Component

After configuring the Speech-to-Text settings, the system automatically generates integration code matching the configuration setup within the AI Studio.

Accessing Generated Code

Configure all desired settings using the right side menu
Navigate to the Code tab
Select the appropriate format for your application:
- HTML (for web component implementation)
- JavaScript SDK or JSON Config (for JSON configuration)
- React (for React applications)

The generated code can be copied and integrated directly into your application to implement the Corti dictation component with your specified configuration.

Summary

The AI Studio Speech-to-Text feature provides a configurable dictation solution with real-time transcription, event monitoring, and customizable interface options. By adjusting the settings in the configuration panel, organizations can tailor the dictation experience to meet specific workflow requirements and then implement the solution using automatically generated integration code.

Have a question for our team?

Click Support in the bottom-left corner of the console to submit a ticket or reach out via email at [email protected] and we'll be happy to assist you.

Tuning Automatic Speech Recognition Models

The Ultimate Guide to Corti's API

Corti Speech Recognition: Overview & Endpoints

Starter Guide to Corti's API

Data Sharing Requirements for Speech-to-Text Model Finetuning