Captions and Accessibility

Captions are crucial for people who are Deaf, hard of hearing, or have auditory processing disorders. All videos at UC Berkeley must have accurate and edited captioning, regardless of the platform the video is hosted on. 

How do I make sure my captions are accurate?

Manually checking a video’s captions is the best way to ensure that they are fully accurate. Many video hosting platforms, including Youtube, provide automatic closed captioning. This is a great place to start in building your captions, but is not enough to ensure that they are accurate and fully accessible.


  • Use correct spelling, capitalization, and punctuation.
  • Make sure that your captions accurately reflect the audio content.
  • Identify your speakers every time a new person speaks, or when a speaker isn’t shown speaking. Display the speaker’s name in all caps with a colon.

    Example: OBI-WAN: These aren't the droids you’re looking for.
  • Describe non-verbal audio. This includes things like music or someone laughing. Include non-verbal audio in brackets. Only include background sounds if they’re important to the context or meaning of the video.

    Example: [music], [laughter], [audience applause], [dog barking]
  • Sync captions with the audio. Captions must appear when the audio is heard.
  • Use two lines of text per caption frame. Three or more lines are difficult to read and understand.
  • Write your captions in the language of your video’s audio. For example, if your video is in English, write your captions in English. If your video is in Spanish, write your captions in Spanish.
  • Keep captions on screen long enough for people to read them, but not too long. Aim for 2-7 seconds per caption frame.


  • Don't rely entirely on auto-captioning. Auto-captions are a great way to get started captioning your video, but they will not be accurate without additional editing.
  • Don't write captions in all caps. Only use all caps for YELLING or speaker identification.
  • Don’t use more than 42 characters per line in a caption frame. It's hard to read. Even better; aim for a maximum of 32 characters per line.
  • Don't caption background music if it will interfere with the dialog captions.

Best practices for captioning various situations

When in doubt, write what is easiest to read and understand. Two shorter lines are more readable than one very long line. Avoid putting the last word of a sentence on the next caption frame.

  • Animal Sounds: Sounds can be described and put in brackets, or spelled out (onomatopoeia), or both.

    Option 1: [dog barking] 
    Option 2: Woof, woof!
    Option 3: [dog barking] Woof, woof!
  • Music: Describe the background music style if needed, in brackets. If the background music plays for a long time, stop the caption frame after 4-5 seconds. If you include captions for song lyrics, add a musical note (♪) to the beginning and end of each line.

    Example: [ethereal classical music]
    Example: ♪ Take another little piece of my heart now, baby ♪ 
  • Punctuation: Use punctuation to indicate the speed or pace of a sound effect. You can use an ellipsis for extended pauses, commas for brief breaks, and dashes for quick repetition.

    Example: Oh... my... g-g-god. Oh, Em, Gee.

  • Lists: When listing out items in a series, use the Oxford comma.

    Example: One, two, three, four.

  • Speaker Tone: It may sometimes be appropriate to add a description of the speaker’s tone in brackets.

    Example: [whisper], [aggravated]

  • Speaker Name: If the speaker’s name is unknown, some alternatives are: STUDENT, AUDIENCE MEMBER, PROFESSOR. If there are multiple unknown speakers use numbers: STUDENT #1, STUDENT #2.

    STUDENT #1: Hello.
    STUDENT #2: Good morning.

  • Math: If transcribing math content, use only numerals. For all other topics, write out numbers 1-10 (one, two), and use numerals for numbers over this (11, 53, 978), or use a combination for large, rounded numbers (3 million).


Who uses captions?

  • People who are Deaf, or Deafblind
  • People who are hard of hearing
  • People with auditory processing disorders
  • Visual learners
  • People who want to to improve retention of information
  • People in quiet environments with the sound off
  • People in noisy environments
  • People who are learning to speak English

My video already has auto-captions. Why isn’t this enough?

Auto-generated captions are not accurate enough to be considered accessible. They are a great place to start when you're building your captions, but additional editing is required to ensure that the captions are accurate, have correct spelling, punctuation, and capitalization, and all important audio information is displayed.

My video already has open (or burned-in) captions. Is this enough?

Open captions may be compliant, but we recommend adding closed captions as well to improve the accessibility of your content. Closed captions give your end users more flexibility in turning the captions on and off, displaying the text size and style, and moving the placement of the captions.

What language should my captions be in?

Captions should be written in the same language as the audio of the video. For example, if your video is in English, write your captions in English. If your video is in Spanish, write your captions in Spanish.

Are captions and subtitles different?

Yes. Subtitles are a translation of dialog and do not include background sounds.

What if my video has no sound?

Even if your video has no sound, you will still need to include a caption file that says [no sound]. This lets users know that no sound is available for the video.

How come when I watch the news, the captions are delayed?

Live media has different requirements. When news shows are live, there’s no way for the transcriptionist to type fast enough for the text to be synchronized with the speech.

Should I include every "um" in the captions?

Usually not. Natural speech tends to be messier than scripted speech and may be difficult to follow if all filler words and false starts are transcribed. A caption style called "clean read" removes most of the filler words to improve comprehension. When captions include every um, ah, and you know, this is called "full verbatim." This approach is only used for scripted speech (plays, TV) and court reporting.

Who can I contact if I have more questions?

You can always email the Digital Accessibility team at if you have any questions about your video captions.

More resources

Video tutorials