Vocal Authoring

From RBN Documentation
Jump to: navigation, search


File Alignment

It is always a good idea to make sure that the vocal stem and the dry vocal file are perfectly aligned before you start authoring the vocal part. We’ve found that 30~40 ms of difference is not so noticeable and doesn’t affect scoring significantly in game.

But if the difference is any larger, the timing of the dry vocal file should be adjusted to the wet vocal file. This will ensure that the game scores properly, and will make your authoring more accurate if you’re using the dry vocals file for your authoring reference.

Authoring Rules

Vocals only have to be authored once. The different difficulties change the leniency of the pitch detection system. Unlike the other instruments, vocal authoring is not quantized to the beat of the song, which can make authoring much more time consuming.

Usually, authoring is done by ear, and by looking at the waveform of the vocal part in order to determine exactly where notes should start and end. In songs that use a lot of vocal effects or heavy reverb, it is often easier to use the dry vocal file as your authoring reference. If you do this, be sure to double check your work against the wet vocal file when you’re done, as this is the sound file played in-game.

You should disable the "Snap items to grid" option, or use a very precise grid division such as 1/64 or 1/128, when adding vocal notes so that you can place the starting and ending points of each note as accurately as possible. If you are aligning notes to the grid at 1/16 or even 1/32, you will inevitably have notes that start a bit early or late compared to the audio, possibly having an adverse effect on scoring.

Multiple Vocal Parts and Solo Vocals

For songs that utilize harmonies, check the Harmony Authoring document if you wish to create multiple vocal lines for multiple singers. Harmonies are not required; however, there are some guidelines to follow when authoring the Solo part (PART VOCAL) if there are multiple sung parts in the audio. The general rule when choosing which part you will author is to choose the part that you would sing if you were singing along with the song on the radio.

If there is a part that repeats several times, you want to be as consistent as possible. For example, you don't want to chart the lead vocal in the first chorus and the harmony in the second. This would confuse the player, especially if they do not know the song well, or at all.

If there are multiple singers that sing parts at different times, it's generally OK to include both parts. An example of this type of authoring can be found in the song "Spoonman" by Soundgarden, on the Rock Band 2 disc.

In the case of overdubbing, where the last word(s) of a phrase and the first word(s) of the following phrase overlap in the audio, you will have to decide how to best break up the phrases. Most often this means not authoring a note for the last word of the first phrase in favor of authoring all of the second phrase, but it depends on which of the overlapping words are more prominent in the audio and what makes more sense lyrically.


Pitched vocal parts are authored from note C1(36) to C5(84). <-- Update: Please avoid note 84, as it will not show up correctly in Blitz.

Even though Rock Band allows singers to use the octave most comfortable for them, it’s a good idea to author in the same octave as the original vocal in order to keep things clear and consistent.

Rock Band vocal charts are somewhat simplified representations of the original vocal performance. We generally avoid authoring the most minute details of the performance such as vibrato or really short lead-ins, as these add unnecessary complexity that makes the song less fun to sing in the game.

As a general rule, we don’t include consonants at the beginning and the end of a syllable in the length of a note tube. This is naturally the way people recognize the length of notes when they sing, and scoring doesn’t work correctly without pitch content, which is in vowels.

That being said, there are exceptions. If you omit a loud and long consonant (ex. "sh" "s") from the beginning of a note tube completely, it tends to feel late in game. In this type of case, we include the very end of the consonant in the note tube.

Where a note tube ends (especially a very long one) sometimes depends on how loud the vocal track sounds in game. If some note tubes look too long in game because the note gets inaudible sooner, they should be shortened to match what you hear.

It is important to make note-on timing very accurate, as it is a big focus of attention for the player, but note-off timing is less so. Sometimes we make note tubes shorter than they actually are to give other elements of game play higher priority--for example, when there are not enough places to deploy overdrive, or there is not enough room to breathe.

If the last tube in your phrase is pitched, and the next phrase begins with a non-pitched tube, leave a 16th space between the end of the first phrase and the beginning of the next one even if it means cutting the phrase short. There is a rare bug in Rock Band 3's vocal system that makes the phrase extremely difficult to combo if such a space is not there. This mostly applies to short phrases; if your phrases contain long tubes, the bug may not manifest; still, it's best to follow this practice to ensure a bug-free vocal chart.

Much like Guitar/Bass sustains, all vocal notes need to have at least some amount of space between the end of a note and the beginning of the next note. This is especially important when you have slide notes (using the + lyric event) because the slide is drawn in the space between the notes. A small amount of space will create a quick jump from one pitch to the next, while more space will mean a slower, more gradual change in pitch. If there is no space at all, however, your slide notes will appear in-game as separate notes with no lyrics attached to them.

Auto generating Vocal Parts

If you really have a hard time hearing pitches, you can use this process Automatic_Vocal_Track to generate a vocal track. The MIDI that gets created will require a lot of work to clean up, but it may help you as a basic first pass.


In Rock Band, each MIDI note in the vocal part must have a corresponding MIDI lyric event which is placed exactly at the start of each note in the vocal part; this lyric will be shown onscreen along with that note. Each lyric should correspond to one syllable in the song, so you will need to break up multi syllable words into their component syllables.

You should load your lyrics into a plain text (.txt) file, format your text separating each syllable or plus sign with a space, select the notes for your transcriptions of the vocal part, and then import that text file into your Reaper session using the "import lyrics for selected notes from file" action (Shift+L). This action will create lyric events for each selected note, and import the lyrics from your text file.

Multi Syllable Words

Multi syllable words are broken up with dashes:

Hello:: Hel-  lo

The dash tells the game to combine the syllable with the following syllable, so a three syllable word would look like this:

Thunderstruck:: Thun- der- struck

Syllables with Multiple Notes

If a syllable is sung across multiple connected notes, then each additional note has a plus sign (+) lyric event to correspond with it. These lyric events tell the game to connect the start of that note with the end of a previous note using a straight line. Plus signs are also used for slides, bends, and trills, using a new MIDI note with a new plus sign event for each note:

Yeah (over 2 notes):: Yeah +

You can also insert multiple notes in the middle of a Multi-Syllable word. Note that the dash is included at the end of the syllable, not at the end of the plus sign.

 Thunderstruck (with 2 notes on "der"):: Thun- der- + struck


Hyphens are displayed by adding an equals sign (=) to the end of a syllable. This also combines the syllable with the following syllable, but also inserts a hyphen between them:

Ex-Girlfriend:: Ex= Girl- friend

Non-Pitched Words

Non-Pitched words are words that are spoken or shouted, and don’t really have a distinct pitch to them. These words are marked with a pound sign (#) at the end of each syllable. We do not mix pitched and non pitched syllables in the same word.

All right!:: All# right!#

Non-Pitched Multi-Syllable Words

place the hyphen before the # or ^

indefatigably:: in-# de-# fa-# ti-# ga-# bly#
cowardice:: cow-# ard-^ ice#

Lyric formatting

We capitalize the first syllable of every phrase. We also capitalize proper nouns (Ex. "Ted", "Barbur Boulevard"), words that follow a ! or ? mid-phrase, and certain acronyms that would look wrong in lower case (Ex. "CIA", "MVP"). These are the only cases in which we use capital letters; other uses of capitalization will likely be flagged during playtest/review.

If in the process of authoring your song or after having received feedback during Playtest, you make changes to the placement of any phrase markers, don't forget to edit your lyrics so that the capitalization is appropriate for the new phrase locations.

The only end-of-sentence punctuation that should be used is question marks and exclamation marks. We try to use these sparingly and only when we feel appropriate. Commas and periods should not be used for punctuation, though periods can be used for abbreviations; for example, A.M. or P.M. If there is a question mark or exclamation mark in the middle of a phrase, capitalize the next word. We avoid using quotation marks.

To keep our hyphenation consistent, we refer to Merriam-Webster Online Dictionary (http://www.merriam-webster.com/). If a word is not hyphenated on m-w, use dictionary.com (http://www.dictionary.com/) as an additional reference. In the event that a word is broken down differently between the two reference sites (for example, "eve- ry" on dictionary.com and "ev- ery" on m-w.com) either method is acceptable. Just make sure you stay consistent if the word occurs multiple times in a song.

Sometimes a singer will break up vowel sounds with a glottal stop (http://upload.wikimedia.org/wikipedia/commons/4/4d/Glottal_stop.ogg). Use hyphens to add each extra syllable created in this way. Here's an example from Blur's "Song 2": "but nothing i-is".

Since neither m-w.com nor dictionary.com have any suggested hyphenation for multi-syllable contractions, the preferred hyphenation is: It- 'd, It- 'll, Must- 've, Should- 've, Would- 've, Could- 've, Should- n't, Would- n't, Could- n't, Must- n't, etc.

Spanish has rules for syllable division that do not match the rules in English. For more information, see Spanish Syllables.

Lyric Check Tool

RBN Creators Club member "MIMZIC" has created a Web-based tool that allows you to view your lyrics outside of the game environment. The tool can be found here:


Simply browse for your MIDI file which you have exported and the tool will display your lyrics as formatted in your MIDI (with Overdrive phrases highlighted). Click "Plain Text" to view the lyrics as they will appear in-game in the Static HUD. Although you should always check your lyrics in the game, this tool is a great way to double-check for errors before submitting your song to the Creators Club.


Even though the scoring systems for non-pitched notes can handle consonants at the beginning and end of note tubes correctly, we follow the same general rule of note tube placements as with pitched syllables. This is, again, because of how people naturally feel the timing when they see note tubes. Also, we like to keep the way we author consistent to avoid confusion for the player.

We avoid making a partial word non-pitched. If one word has both pitched and non-pitched syllables, we make an entire word one way or another, whichever makes more sense. We try to avoid mixing pitched and non-pitched syllables in one phrase whenever possible. Sometimes it is hard to see a non-pitched word in the middle of a pitched phrase and to have to adjust your vocal cords. We normally try to group at least a few syllables to make the Vocal HUD easier to read. This rule applies less strictly to the first or the last word in a phrase.

There is standard and more generous scoring for non-pitched syllables. The standard scoring is marked with "#" in lyrics and the more generous one with "`^`". As a default we use "#", but we use "`^`" in some cases. For example if a phrase consists of only 1~3 short syllables, we definitely mark them with "`^`". Also vowels or consonants without sharp attacks (ex. "w" or "y" ) tend to be harder to register, especially if you miss the beginning of long note tubes. In this type of case, we use "`^`".

Percussion Sections

Tambourine, cowbell and hand clap are the available percussions. Playable percussion notes are placed on MIDI note C6 (96) and non-playable percussion notes that simply trigger the percussion sample are placed on C#6 (97).

Place text events ([tambourine_start], [tambourine_end], [cowbell_start], [cowbell_end], [clap_start] and [clap_end]) on PART VOCALS for animation cues and set the type of samples you want to use for audio playback in the MAGMA tool. We can use only one type of percussion in one song.

Percussion sections need to be placed in phrases just like regular notes. This is so that the entire percussion part does not appear on one screen in the Static HUD. Two measures is generally a good length for percussion phrases, depending on the song's tempo and the amount of percussion notes authored.

Percussion sections are an addition to the artists’ original works, so we need to be cautious about what we add, and in the end, we need to make subjective decisions. We don’t need to put a percussion section in every available spot. If none of the percussion instruments fits the mood of a song, it is okay not to use any.

We tend to not put a percussion section in the beginning of a song just because the intro sets the mood of a song, and we like to keep it original. A long percussion section over guitar solo or an instrumental part can be very fun and it is a good way to keep a vocal player engaged.


Phrases are how vocals are scored in Rock Band. The game evaluates in real-time how you perform each phrase, gives you a score on that phrase, then begins again. Phrase markers are placed on A6 (105).

A phrase marker must begin on or before the start of the first note that is to be included in the phrase, and end on or after the end of the last note of the phrase. The minimum length for a phrase marker is a quarter-note; phrase markers shorter than a quarter-note will cause the MIDI compiler to return a "Vocal Phrase Overlap" error.

  • C3 note: Be sure the end or beginning of phrase markers don't touch any note tubes, as this might cause visual glitches like OD notes not showing up correctly or lyrics being shown at the wrong time in static vocals mode. Sometimes it's unavoidable in harmony heavy songs, but in every other case it's better to have a 64th gap (or even a 128th gap) between the beginning/end of a phrase and a note tube.

The length of a vocal phrase should be approximately the length of one breath of an average player. In mid-tempo songs, 2 measures is a good length for one phrase. Ultimately the maximum number of lyrics in one phrase is determined by how many lyrics can fit on the screen in the static HUD. TVs with 4:3 (standard definition) resolutions cannot display as many lyrics per screen as 16:9 (high definition) TVs, so make sure to test your song on a standard definition display using the static HUD to make sure no lyrics are clipping off the edges of the screen.

An Overdrive deploy section between phrases automatically appears if the space between the end of the last note in a phrase and the beginning of the first note in the next phrase is longer than 600 ms. As long as all the note tubes are inside of phrase markers, where exactly phrase markers end and start doesn’t matter for Overdrive deploy sections.

There are special rules for creating phrases when authoring harmonies. Check the Harmony Authoring document for more information.


Overdrive in vocals is attached to certain phrases which will give you energy if you earn an AWESOME score on that phrase. Overdrive phrases are marked by copying the Phrase marker up to G#7 (116). The Overdrive note and the phrase marker must line up exactly in order to work, so be sure to move your overdrive marker as well if you’re moving or expanding a phrase marker.

There are specific rules for authoring overdrive for vocal harmonies. Check the Harmony Authoring document for more information.

Octaves and Vocal Ranges

The vocal HUD in the RB system automatically finds the highest and the lowest note in a song (excluding notes for non-pitched lyrics) and assigns pitches between them evenly. It works for most songs, but occasionally we need to manually adjust the range.

For example, if a song has low range verses and high range choruses, we see all the notes near the bottom of the HUD in verses and near the top in choruses. This would still be okay if the entire range is within 2~2.5 octaves.

But if it is wider, it would make it hard to read. To fix this problem, we use the range divider marker, "%". By placing "%" at the end of the last lyric in a phrase, the vocal ranges of the parts before and after the marker get separated. Smaller range helps for players to see pitch changes easily.

This function applies only to the static HUD and not to the scrolling HUD.

Animation Markers

The animation markers used for vocals are:

  • [play]- the standard singing state.
  • [intense]- used for hard, fast sections of a song
  • [mellow]- used for slow, quiet sections of a song
  • [idle]- the character has their microphone down and dances to the beat.
  • [idle_realtime]- the character has their microphone down and is not synced to the beat. Use this for intos and ends of songs, or anywhere else you don’t want your singer bopping to the beat.

[idle] and [idle_realtime] markers are pretty responsive and the singer quickly goes to mic down position. Because of this, it looks more natural to put those mic down markers a beat or so after, not immediately after the last note tube. Also if the space between phrases is shorter than a measure, it’s always a good idea not to put mic down and up (markers like [idle] and [play]) one after another. The singer will not move their mouth while in [idle] or [idle_realtime] so be sure to put the singer into [play], [mellow], or [intense] for any sections where there's singing.

The percussion animation markers like [tambourine_start] mean the system switches to a set of percussion animations for the singer on tambourine. This switch occurs on the camera cut following the animation marker. The percussion animation sets have 2 states, play and idle, and the singer will continue using the last animation they were in when switching over to the percussion set.

For example, If the animation state was [play], after [tambourine_start] is read on the track, the singer will be playing the tambourine at the next camera cut after the marker. If the state was [idle], the singer stays in idle state, but with a tambourine in his/her hand.

To accommodate the transition that scoring and animation system switch to/from percussion sections and to give a player enough time to prepare, we try to have at least one measure of space between singing and percussion sections. If the animation for the vocalist is set to [idle] or [idle_realtime], the lipsync animation turns off. We need to make sure the animation is set to [play] or [intense] for all the note tubes.

A Few Final Tips

Personal tools

Blue text represents C3 changes or additions to the original RBN documentation.