What is SALSA LipSync?

SALSA LipSync v2 is a system for breathing life into game or application characters/avatars using simple ideas and workflows. The goal is to bring AAA quality facial animation features to indie and pro developers with minimal effort.

Watch our video tutorial series for SALSA LipSync Suite v2.


SALSA (Simple Automated LipSync Aproximation) is a realtime system for creating lip synchronization from audio input without the need for phoneme mapping or baking. This creates a simple and fast workflow that allows the designer to implement lipsync into their project without spending a lot of time or effort. Being a realtime solution affords opportunities that do not exist with phoneme mapping solutions, such as: microphone input and text-to-speech input. Realtime means it is not necessary to bake in phoneme mapping analysis. Need to re-record audio voice-overs...just do it -- SALSA does not care.

As mentioned, SALSA is not a phoneme mapping system. The light-weight, realtime capabilities are produced from an approximation algorithm and do not produce Pixar quality lipsync results. However, SALSA LipSync v2 improves dramatically on the capabilities of SALSA v1 and is perfect for most requirements. You will have to make the choice of whether or not you want to leverage SALSA's easy workflow and awesome capabilities into your project. If you are unsure, check out our YouTube channel for SALSA LipSync v2 samples and tutorials.


EmoteR is an emote randomizer utility. Some of the functionality of EmoteR was previously available in RandomEyes (SALSA v1). In SALSA v2, EmoteR is its own module and gets some new, advanced features. It provides easy configuration of emotes and several methods of firing them off. You can, of course, randomly fire emotes. Additionally, you can leverage SALSA to trigger the timing of (emphasis) emotes. And you can also implement repeater emotes that use a repetative cycle.

EmoteR also has a simple API that allows you to take more control over your emoting and trigger emotes via code, UI interface, or Timeline.


Eyes (previously RandomEyes in SALSA v1) is an eye and head tracking and random generation system. Eyes adds many new capabilities that were not available in v1 and will breathe a huge amount of life into your characters on its own. In addition to eye randomization, it is now possible to link one or more character heads into the mix to really liven things up.

Of course, just as with v1, v2 allows you to track objects in the scene (with head and eyes). Eye tracking is also performed independently (where applicable), which creates a super realistic look to eye movement and even supports a cross-eyed look when targets are close.

Watch our video tutorial series for SALSA LipSync Suite v2.


Development Environment

SALSA LipSync Suite v2 was written for the Unity Game Engine development platform. Current version support is all stable, release versions v2017.4.1+ (unless otherwise stated). The Suite is not supported with pre-release Unity engines or packages.

NOTE: SALSA LipSync Suite v2 only works with Unity and there are no plans to port it to other game development platforms.

Platform Support

Please see the Asset Store page for currently supported deployment platforms.

Animation Requirements

SALSA, along with EmoteR and Eyes, has very few requirements. Since it is a Unity asset, it does not care about originating file format. SALSA Suite simply requires the originating format be recognized and imported by Unity into one of the supported controller types. The current supported types are:

  • Shape (blendshapes)
  • Bone (transform)
  • Sprite (single image or array of animated frames)
  • UGUI Sprite (single image or array of animated frames)
  • Texture (single image or array of animated frames)
  • Material (single image or array of animated frames)
  • UMA ExpressionPlayer Poses
  • Animator parameters (bool, float, int, trigger)
  • Events

While 2D and 3D implementations are usually very different, with SALSA Suite v2, mixing 2D and 3D is actually pretty easy. For both 2D and 3D, there simply needs to be a sufficient capability to individually manipulate the mouth (for lipsync), the face (for EmoteR), and the head/eyes for Eyes.

3D Requirements

The current, supported methods for manipulating 3D model features is using either bones (transforms) or blendshapes (also known as morphs, shapekeys, etc. in 3D modeling software).

NOTE: It is also possible to implement some 2D workflows in conjunction with 3D models, such as performing texture swaps on face materials.

Typically, blendshapes are best for facial animations, lip/mouth movement, brow movement, etc. Bones work best for eyes and head, and may also be used to manipulate the jaw. However, either bones or blendshapes may be used to manipulate all aspects of facial animation. SALSA and EmoteR can mix and match blendshapes and bones for viseme and emote expression configurations. For that matter, they can also include 2D-centric controllers and swap out textures, etc. The Eyes module has some stricter requirements due to the specific movement controls it implements -- it uses templates for head/eyes/eyelids, corresponding to the specific animation controller types (bones, shapes, sprites, textures, etc.).

When using blendshapes, the hard, simple rule is: Unity has to be able to utilize whatever morphing system your 3D modeler implements. When you import your model, if Unity creates a SkinnedMeshRenderer which contains sufficient blendshapes, your model will work with SALSA LipSync Suite. Unity has to recognize blendshapes. If it does not, SALSA Suite cannot work. If there are not sufficient blendshapes to create independent mouth-movements, results will not be as fantastic as they could be. SALSA Suite is magical, but it is not magic, it cannot create blendshapes to move your character.

Likewise, when using bones, it is necessary for there to be a sufficient number of bones to independently manipulate the body/facial section in question. If there are no bones to manipulate the facial part you are trying to animate, SALSA cannot create them.

Blendshapes -- What is required?

As mentioned above, a sufficient selection of shapes are required to perform dynamic animations. Since SALSA, EmoteR, and Eyes each work on different parts of the face and in different ways, they have differing requirements.

SALSA's Blendshape Recommendations

As mentioned previously, SALSA configuration supports a wide variety of controller types. For 3D character models, blendshapes are probably the easiest to implement. Considering a blendshape implementation, SALSA has very loose requirements, mainly based on desired look-and-feel. SALSA v1 utilized three shapes to perform realtime lip-sync and did so to great effect. The same capability exists in SALSA v2. In fact, the minimum requirement is a single viseme configuration (one shape). But, that is not nearly as cool and exciting as it could be. Version 2 supports an unlimited number of viseme configurations, but there are practical limits and points of diminishing returns. And with the implementation of Advanced Dynamics, the amount of dynamic representation is greatly expanded with even fewer shapes implemented.

The OneClick configurations (on supported models) utilize seven shapes. The shapes we have chosen to implement are based as closely as possible on the phoneme visualisations for: w, t, f, th, ow, ee, oo (in that order). For example, the OneClick configuration for the Reallusion CC3 standard female model is demonstrated in the following images:

Reallusion CC3 SALSA OneClick viseme configuration front -front view-

Reallusion CC3 SALSA OneClick viseme configuration side -side view-

Of course, you are free to implement whatever scheme you wish. More or fewer visemes. Different visemes. Whatever you like. SALSA is flexible enough to meet your requirements.

EmoteR's Blendshape Recommendations

There are no set blendshape requirements for EmoteR since it is an emote driver. It is purely based on your requirements. If you only need a happy-face emote, then that is your only requirement. If you would like to implement SALSA-triggered emphasis emotes, here are the representations we have implemented as closely as possible in our OneClicks:

  • Exasperation (cheek puff, slight brow raise -- make it subtle)
  • Soften (a smile and relaxed brows)
  • BrowsUp (both brows raised)
  • BrowUp (raise a single brow more than the other)
  • Squint (eye and brow narrowing)
  • Focus (eye narrowing)
  • Flare (nose/nostril movement)
  • Scrunch (nose/nostril movement, eye/brow narrowing)

Eyes Blendshape Recommendations

Eyes can use blendshapes to move the eyes; however, this is probably better implemented with bone rotations. Blendshape "rotations" are not optimal since the overall shape of the rotating object is actually deformed during the "rotation". Blinks however, do work well, even though they can be subject to the same deformation associated with "rotation" since they are closing around a typically round object.

The Eyes module in SALSA Suite v2 does have a new feature that is very cool but does have some requirements: Eyelid Tracking. To implement Eyelid Tracking, the model has to have independent control over top and bottom eyelids (also upper-only control). This allows the upper and lower eyelids to move independently in the upward or downward directions to subtly follow eye movement.

NOTE: not all OneClick implementations support Eyelid Tracking. Fuse, ACG, iClone, CC3 and UMA support Eyelid Tracking. Only UMA supports upper and lower tracking.

Bone Requirements

This is a little less cut-and-dry. A bone/transform is required to move, rotate, or scale a weighted portion of a model. Bones are recommended for eye and head movement but can become unwieldy for mouth and emote configuration due to the quantities of bones typically required for such movement. However, it is certainly possible to do so. Most model generation systems (other than UMA) do not use bones for facial animation, however, if you are creating your own setup, you may certainly use bones and configure the SALSA Suite to animate your character model. The suggestion is to read the section on blendshapes and configure a sufficient number of bones with specific weights to create the same representation of movement.

2D Requirements

SALSA LipSync Suite v2 currently supports sprite, texture, and material manipulation. It uses a switching mechanism to swap out element frames to create animation, much like the cell-based cartoon animation techniques. SALSA v1 supported single-frame sprite configurations, swapping out one frame for each represented animation. SALSA Suite v2 also supports the same single-frame configuration as well as multi-frame configurations and has added additional support for textures, materials, and UGUI sprites.

Whether you are using sprites or textures, the golden rule for getting the most out of the SALSA Suite is to implement separate facial segments for animation. The main separation should be eyes animated separately from the mouth. However, if you wish to introduce emotes, you may have to get creative in your process since emotes frequently include eye and mouth movement. SALSA Suite has some mechanisms to deal with this such as priority overrides in the QueueProcessor following these basic rules:

  • Lipsync overrides everything.
  • Emotes override Eyes/Eyelids/Head
  • and Eyes/Eyelids/Head override themselves

2D Eyes Requirements

Eyes has some specific requirements for 2D depending on whether you are using sector-based or bone/transform-based movement. Bone-based movement is the loosest and really just requires your model looks correct as the eye is moved around on its transform. For example, if manipulating an eye behind an eye-socket, the eye should not expose itself outside of the socket area and should also not expose blank space within the socket -- unless, of course, that is the look-and-feel you are going for.

Sector-based movement is a bit different. The Eyes module has a few templates to work with such scenarios:

  • 2-way (left, right)
  • 3-way (left, center, right)
  • 5-way (center, top, right, bottom, left)
  • 9-way (center, top, top-right, right, bottom-right, bottom, bottom-left, etc.)

If using sectors, your animated cells should adhere to one of the above template scenarios.