Cerence
30-60 available languages
English, French, Arabic, Bulgarian, Chinese, Czech, Danish, Dutch, Persian, Finnish, Greek, Hebrew, Hungarian, Italian, Indonesian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Slovak, Spanish, Swedish, Thai, Turkish, Basque, Bengali, Catalan, Croatian, Icelandic, Gaelic, Hindi, Malay, Romanian, Tamil, Vietnamese…
Software Development Kit
STT / TTS / WUW…
Embedded technology
Multilingual

All embedded voice technology with one solution: CSDK.

The CSDK (Cerence Software Development Kit) is an embedded voice technology in the form of a software development kit. With these tools, you will be able to integrate different voice features to create many types of interactions.

The CSDK contains to date in its latest version :

  • Cerence ASR (previously VoCon) : an embedded ASR module (Automatic Speech Recognition, or Speech-to-Text (STT)) for voice transcription.
  • Cerence TTS (previously Vocalizer) : An embedded TTS (Text-to-Speech) module used to produce voice synthesis.
  • Cerence NLU: An NLU (Natural Language Understanding) module for natural language comprehension in embedded systems.
  • Cerence Audio-Processing: Several tools included in the CSDK to improve and facilitate the audio processing of the microphone.
  • Dev Tools: A Windows-based software suite to help you develop your voice solutions.

Embedded voice transcription engine VoCon (Speech-to-Text), Cerence ASR.

Cerence ASR (also known as STT for Speech-to-Text) is one of the most popular embedded voice transcription solutions. It is also the voice engine formerly known as VoCon by Nuance.

Included in the CSDK, it offers superior functionality, unmatched accuracy and high performance for a variety of applications that benefit from speech control. Designed as a modular and scalable engine, Cerence ASR can be adapted to a wide range of embedded applications in industry, logistics, transportation, etc.

The great strength of the Cerence ASR included in the CSDK lies in the notion of extensive dictionaries. This feature allows you to directly modify the lexicons understood by the transcription engine to improve the performance of the tool in particular cases, for example, in the case of terms specific to your business. If the word is originally misunderstood, it is possible to rework the associated phonetics using the development tools.

Cerence ASR have several functions, among them are:

  • Wide vocabulary support : Allows speech recognition of large corpus up to millions of units.
  • High reliability in a noisy environment : Capable of high-precision recognition with a signal-to-noise ratio as low as 5dB.
  • Embedded Voice Dictation : Recognizes free dictation text more broadly than separate voice commands.
  • Spelling module: Allows you to act as a back-up for the voice recognition system.

For more information about the features of Cerence ASR, you can contact us directly for a detailed presentation.

Vocalizer Text-to-Speech embedded and Cloud-based speech synthesis tool, Cerence TTS.

Cerence TTS (previously know as Vocalizer), also module of the CSDK, transforms the voice assistant experience by offering the most natural speech synthesis for cloud and embedded applications. Cerence offers Cerence Cloud Services and integrated SDKs for Windows, Linux, OSX, Android and iOS.

Cerence TTS is a suite of solutions for vocal synthesis to generate high-quality voice from Text-to-Speech and pre-recorded audio. The software is optimized for reading long texts in a natural and humane way. New algorithms based on models Deep-Learning offer greater fluidity and more natural prosody, providing a unique vocal experience. 

Cerence TTS also own several features such as:

  • Emotional voice synthesis: Choice between 4 ways of speaking (neutral, playful, authoritative and empathetic)
  • Improved Expression Styles : Ability to enhance text-to-speech with pre-recorded speech elements.
  • Contextual intelligence: Optimizes the reading of certain elements by an intelligent tagging system for addresses, dates, phone numbers…
  • Prosody control: Manipulation of pitch, volume, rhythm and timbre of the synthesized voice.

For more information about the features of Cerence TTS, you can contact us directly for a detailed presentation. 

 

The technical environments for integrating the CSDK locally into your systems are as follows:

Operating system-dependent PLC linking and conditioning: 

  • Android: CSDK will be delivered with a Java-API binding compiled in an Android archive (AAR).
  • Win/Linux: CSDK will be equipped with C-API Binding
  • Apple iOS: CSDK is shipped in a framework archive; it will be deployed with Objective-C binding and bridging headers to support the Swift API.

Standard Ports and Tools : 

  • iOS (version 7.0 and up): arm64 and x86_64
  • Android (version 6.0 and up): armv7 (32Bit), arm64 and x86_64
  • Linux: armv7 (32Bit), arm64 and x86_64
  • Windows: x86_64

Code and data :

 
Feature Code size (MB)
Basic command and control 3,2MB
All features, largest acoustic model 9,5MB

 

Data, model size

 
Component Data size per language

Acoustic model by language –

Gen 4 compact / Gen 5 / Gen 6

~900kB / ~4MB / ~6MB
CLC – Monolingual 300-7300kB
CLC – Multilingual 700-3000kB

 

Use Cases: Data Size and Total RAM Usage.

 
Component Data size per language Total RAM usage
Number recognition 4kB 1,25MB
Basic application C&C 100/10K commandes 10/500kB 1,3/1,8MB
Telephony with grammar + expressions 0,52MB 12,6MB
Points of interest and addresses (USA only) 300MB 56MB
Embedded Voice Dictation 100MB 100MB

 

 

 
Component Storage required (out of code) RAM used
Compact onboard (small system) 10MB average / 21MB maximum 6MB average / 23MB maximum
Pro boarding (TTS optimized for better capacity as for navigation, SMS reading…) 55MB average / 131MB maximum 14MB average / 38MB maximum
Embedded High (High quality TTS, suitable for all uses) 120MB average / 325MB maximum 24MB average / 69MB maximum
Embedded Premium (Highest performing TTS on a Deep Learning model) 337MB average / 558MB maximum 159MB average / 198MB maximum

The code size for a full-featured Cerence TTS takes 10 to 13.5MB depending on the integration platform. However, this can be optimized depending on the choice of languages and features selected for use.

The necessary documentation as well as detailed technical information on the CSDK and its modules is available on request. We can also provide you with a commented explanation of these technical documents to help you understand and use them.

Would you like to try the CSDK?

We can grant you an evaluation period!

The VoiceMarket accompanies you in your projects.

b3lineicon|b3icon-comments||Comments
The state of the art in embedded voice technologies.
b3lineicon|b3icon-molecule||Molecule
A versatile, multi-purpose and complete solution.
b3lineicon|b3icon-medal||Medal
Spin-off from Nuance, Cerence has established himself as one of the leaders in the vocal field.

The state of the art of embedded voice.

CSDK is the flagship solution when it comes to embedded voice technology today. Integrated in the products of the largest companies in many applications, CSDK is constantly developing voice based human-computer interactions with ever increasing performance.

Complete and multi-purpose solution.

The CSDK comes in the form of a software development kit. This nature allows its users to modulate it at their convenience to carry out their voice projects. This versatility in use makes the CSDK a truly complete tool for the creation of voice applications, all the more so in embedded contexts.

Spin-off of a leader in modern vocals.

Cerence is a spin-off of the world-renowned Nuance, a leader in speech technology. This particular affiliation allows the company, and in particular the CSDK, to benefit from one of the best technological expertise in the field of voice, a guarantee of irreproachable quality.

What the CSDK can do for you…

b3lineicon|b3icon-atom||Atom

A tailor-made solution.

The CSDK is a modular tool offering you different modules to be integrated according to your needs and constraints. This versatility allows you to design the most suitable solution for your project to optimize its performance.

b3lineicon|b3icon-gear||Gear

100% embedded voice.

The main argument of the CSDK, embedded voice technology, allows to create voice use cases independently from the use of Cloud. This agility is notably indispensable in certain environments where internet connection lacks.

b3lineicon|b3icon-globe||Globe

Multilingual technology.

Depending on its modules, the CSDK is able to manage from 30 to more than 60 different languages in a totally embedded way. The exhaustive list of compatible languages can be found at the top of the page in the main information.

b3lineicon|b3icon-browser-cart||Browser Cart

A single business model.

The revenue system of the CSDK is very simple, it is a annual renewal license per device and/or per user. The price of a license is available on request directly from the VoiceMarket with a complete quotation of your need as well if wished.

Would you like to talk about CSDK?

Would you like more information about this technology?