How to format video captions

Captions are crucial for people who are Deaf, hard of hearing, or have auditory processing disorders. All videos at UC Berkeley must have accurate, human-edited captioning, regardless of the platform the video is hosted on.

How do I make sure my captions are accurate?

Manually checking a video’s captions is the best way to ensure that they are fully accurate. Many video hosting platforms, including Youtube, provide automatic closed captioning. This is a great place to start in building your captions, but is not enough to ensure that they are accurate and fully accessible.

Do:

Use correct spelling, capitalization, and punctuation.
Make sure that your captions accurately reflect the audio content.
Identify your speakers every time a new person speaks, or when a speaker isn’t shown speaking. Display the speaker’s name in all caps with a colon.

Example: OBI-WAN: These aren't the droids you’re looking for.
Describe non-verbal audio. This includes things like music or someone laughing. Include non-verbal audio in brackets. Only include background sounds if they’re important to the context or meaning of the video.

Example: [music], [laughter], [audience applause], [dog barking]
Sync captions with the audio. Captions must appear when the audio is heard.
Use two lines of text per caption frame. Three or more lines are difficult to read and understand.
Write your captions in the language of your video’s audio. For example, if your video is in English, write your captions in English. If your video is in Spanish, write your captions in Spanish.
Keep captions on screen long enough for people to read them, but not too long. Aim for 2-7 seconds per caption frame.

Don’t:

Don't rely entirely on auto-captioning. Auto-captions are a great way to get started captioning your video, but they will not be accurate without additional editing.
Don't write captions in all caps. Only use all caps for YELLING or speaker identification.
Don’t use more than 42 characters per line in a caption frame. It's hard to read. Even better; aim for a maximum of 32 characters per line.
Don't caption background music if it will interfere with the dialog captions.

Best practices for captioning various situations

When in doubt, write what is easiest to read and understand. Two shorter lines are more readable than one very long line. Avoid putting the last word of a sentence on the next caption frame.

Animal Sounds

Sounds can be described and put in brackets, or spelled out (onomatopoeia), or both.

Example 1: [dog barking]
Example 2: Woof, woof!
Example 3: [dog barking] Woof, woof!

Music

Describe the background music style if needed, in brackets. If the background music plays for a long time, stop the caption frame after 4-5 seconds. If you include captions for song lyrics, add a musical note (♪) to the beginning and end of each line.

Example: [ethereal classical music]
Example: ♪ Take another little piece of my heart now, baby ♪

Punctuation

Use punctuation to indicate the speed or pace of a sound effect. You can use an ellipsis for extended pauses, commas for brief breaks, and dashes for quick repetition.

Example: Oh... my... g-g-god. O, M, G.

Lists

When listing out items in a series, use the Oxford comma.

Example: One, two, three, and four.

Speaker Tone

It may sometimes be appropriate to add a description of the speaker’s tone in brackets.

Example: [whisper], [aggravated]

Speaker Identification (Speaker ID)

If the speaker’s name is unknown, some alternatives are: STUDENT, AUDIENCE MEMBER, PROFESSOR. If there are multiple unknown speakers use numbers: STUDENT #1, STUDENT #2.

Example:
STUDENT #1: Hello.
STUDENT #2: Good morning.

Math

If transcribing math content, use only numerals. For all other topics, write out numbers 1-10 (one, two), and use numerals for numbers over this (11, 53, 978), or use a combination for large, rounded numbers (3 million).

FAQs

Who uses captions?

People who are Deaf, or Deafblind
People who are hard of hearing
People with auditory processing disorders
People who are visual learners
People who want to to improve retention of information
People in quiet environments with the sound off
People in noisy environments
People who are learning to speak English

What is the difference between open (or burned-in) captions and closed captions?

Closed captions can be toggled on or off, and is added to a video as a seperate file. Closed captions can be edited later if errors are found.

Open captions are burned into the video, cannot be changed by the end users, and are much more difficult to update or change if errors are found.

My video already has open (or burned-in) captions. Is this enough?

Open captions may be compliant, but we recommend adding closed captions as well to improve the accessibility of your content. Closed captions give your end users more flexibility in turning the captions on and off, displaying the text size and style, and moving the placement of the captions.

My video already has auto-captions. Why isn't that enough?

Auto-generated captions are not accurate enough to be considered accessible. They are a great place to start when you're building your captions, but additional editing is required.

You will need to ensure your captions are fully accurate, have speaker identificaiton, have correct spelling, punctuation, and capitalization, and all important background information has been added.

Learn more about how to edit your auto-captions on YouTube.

What language should my captions be in?

Captions should be written in the same language as the audio of the video. For example, if your video is in English, write your captions in English. If your video is in Spanish, write your captions in Spanish.

Learn more about how to caption a multi-lingual video.

What is the difference between captions and subtitles?

Captions are a text representation of the audio content. Captions are presented in the same language as the audio content. For example, a lecture in English will be captioned in English.

Subtitles are a text translation of the audio content. Subtitles are translated into the language used by the intended audience. For example, a lecture in English will have subtitles to translate the lecture into Spanish.

What if my video has no sound?

All videos are required to have captions. Even if your video has no sound, you will still need to include a caption file that says [no sound]. This lets users know that no sound is available for the video.

What if my video only has music?

All videos are required to have captions, even if there is only music playing. If this is the case, describe the music to your audience.

Examples: [Music fades in], [Loud drumming]

How come when I watch the news, the captions are delayed?

Live media has different requirements. When news shows are live, there’s no way for the transcriptionist to type fast enough for the text to be synchronized with the speech.

Should I include every "um" in the captions?

Usually not. Natural speech tends to be messier than scripted speech and may be difficult to follow if all filler words and false starts are transcribed. A caption style called "clean read" removes most of the filler words to improve comprehension. When captions include every um, ah, and you know, this is called "full verbatim." This approach is only used for scripted speech (plays, TV) and court reporting.

Who can I contact if I have more questions?

You can always email the Digital Accessibility team at improving-accessibility@berkeley.edu if you have any questions about your video captions.

More resources

Captioning Key by DCMP (Described and Captioned Media Program)
WCAG 2.0 success criteria for prerecorded video
3Play: Captioning Sound Effects in TV and Movies
How to edit auto captions in YouTube

Captions and Accessibility

How do I make sure my captions are accurate?

Do:

Don’t:

Best practices for captioning various situations

Animal Sounds

Music

Punctuation

Lists

Speaker Tone

Speaker Identification (Speaker ID)

Math

FAQs

Who uses captions?

What is the difference between open (or burned-in) captions and closed captions?

My video already has open (or burned-in) captions. Is this enough?

My video already has auto-captions. Why isn't that enough?

What language should my captions be in?

What is the difference between captions and subtitles?

What if my video has no sound?

What if my video only has music?

How come when I watch the news, the captions are delayed?

Should I include every "um" in the captions?

Who can I contact if I have more questions?

More resources

Video tutorials

Quick Links

How to format video captions

Captions and Accessibility

How do I make sure my captions are accurate?

Do:

Don’t:

Best practices for captioning various situations

Animal Sounds

Music

Punctuation

Lists

Speaker Tone

Speaker Identification (Speaker ID)

Math

FAQs

Who uses captions?

What is the difference between open (or burned-in) captions and closed captions?

My video already has open (or burned-in) captions. Is this enough?

My video already has auto-captions. Why isn't that enough?

What language should my captions be in?

What is the difference between captions and subtitles?

What if my video has no sound?

What if my video only has music?

How come when I watch the news, the captions are delayed?

Should I include every "um" in the captions?

Who can I contact if I have more questions?

More resources

Video tutorials

How to Edit Captions in YouTube

3Play Media Help Guide