Vocaloid

Friday, February 22, 2013

Vocaloid software history

Vocaloid

Yamaha started development of Vocaloid in March 2000 and announced it for the first time at the German fair Musikmesse on March 5–9, 2003. The first Vocaloids, Leon and Lola, were released by the studio Zero-G on March 3, 2004, both of which were sold as a "VirtualSoul Vocalist". Leon and Lola made their first appearance at the NAMM Show on January 15, 2004. Leon and Lola were also demonstrated at the Zero-G Limited booth during Wired Nextfest and won the 2005 Electronic Musician Editor's Choice Award. Zero-G later released Miriam, with her voice provided by Miriam Stockley, in July 2004. Later that year, Crypton Future Media also released their first Vocaloid Meiko. In June 2005, Yamaha upgraded the engine version to 1.1. A patch was later released to update all Vocaloid engines to Vocaloid 1.1.2, adding new features to the software, although there were differences between the output results of the engine. A total of five Vocaloid products were released from 2004 to 2006. Vocaloid had no previous rival technology to contend with at the time of its release, with the English version only having to face the later release of VirSyn'sCantor software during its original run. Despite having Japanese phonetics, the interface lacked a Japanese version and both Japanese and English vocals had an English interface. The only differences between versions were the color and logo that changed per template. As of 2011, this version of the software is no longer supported by Yamaha and will no longer be updated.

Vocaloid 2

Vocaloid 2 was announced in 2007. Due to time constraints, unlike the previous engine version, it did not have a public beta test and instead the software was updated as users reported issues with it. The synthesis engine and the user interface were completely revamped, with Japanese Vocaloids possessing a Japanese interface. New features such as note auditioning, transparent control track, toggling between playback and rendering, and expression control were implemented. One's breath noise and husky voice can be recorded into the library to make realistic sounds. This version is not backward compatible and its editor cannot load a library built for the previous version. Aside from the PC software, NetVocaloid services are offered. Despite this, the software was not localized and Vocaloids of either English or Japanese would only possess that language version, so although Megurine Luka had an English library included, as a Japanese Vocaloid she only had access to the Japanese version of the software. In total, there were 17 packages produced for Vocaloid 2 in the Japanese version of the software and five in the English version; these packages offered 35 voicebanks between them in either English or Japanese.

Yamaha announced a version of the Vocaloid 2 software for the iPhone and iPad, which exhibited at the Y2 Autumn 2010 Digital Content Expo in Japan. Later, this version of the software was released using the voice of Yamaha's own Vocaloid called VY1.

Vocaloid 3

Vocaloid 3 launched on October 21, 2011, along with several products in Japanese and a Korean product, the first of its kind. Several studios are providing updates to allow Vocaloid 2 vocal libraries to come over to Vocaloid 3. It will also include the software "Vocalistener", which adjusts parameters iteratively from a user's singing to create natural synthesized singing. It supports additional languages including Chinese, Korean, and Spanish. It is also able to use plug-ins for the software itself and switch between normal and "classic" mode for less realistic vocal results. Unlike previous versions, the vocal libraries and main editing software are sold as two separate items. The vocal libraries themselves only contain a "tiny" version of the Vocaloid 3 editing software. Yamaha will also be granting the licensing of plug-ins and use of the Vocaloid software for additional mediums such as video games. Also, Vocaloid 3 has Triphone support unlike Vocaloid 2 which improves language capabilities. The first Spanish Vocaloids, Clara and Bruno, were released in 2011.

New technology is also being used to bring back the voice of the singer Hitoshi Ueki who died in 2007. This is the first attempt to bring back a singer whose voice had been lost, yet it had been considered a possibility since the software was first released in 2004. However, this is only being done for private use.

Vocaloid Cultural Impact in Japan

Hatsune Miku is mostly responsible for Vocaloid's success.

The software became very popular in Japan upon the release of Crypton Future Media's Hatsune Miku Vocaloid 2 software and her success has led to the popularity of the Vocaloid software in general. Inside of Japan, the software has proven to be popular overall, with thousands of original songs by artists across Japan. Japanese video sharing website Nico Nico Douga played a fundamental role in the recognition and popularity of the software. A user of Hatsune Miku and an illustrator released a much-viewed video, in which "Hachune Miku", a super deformed Miku, held a Welsh onion (Negi in Japanese) and sang the Finnish song "Ievan Polkka" like the flash animation "Loituma Girl", on Nico Nico Douga. According to Crypton, they knew that users of Nico Nico Douga had started posting videos with songs created by the software before Hatsune Miku, but the video presented multifarious possibilities of applying the software in multimedia content creation—notably the dōjin culture. As the recognition and popularity of the software grew, Nico Nico Douga became a place for collaborative content creation. Popular original songs written by a user would generate illustrations, animation in 2D and 3D, and remixes by other users. Other creators would show their unfinished work and ask for ideas. The software has also been used to tell stories using song and verse and the Story of Evil series has become so popular that a manga, six books, and two theatre works were produced by the series creator. Another theater production based on "Cantarella", a song sung by Kaito and produced by Kurousa-P, was also set to hit the stage and will run Shibuya's Space Zero theater in Tokyo from August 3 to August 7, 2011. The website has become so influential that studios often post demos on Nico Nico Douga, as well as other websites such as YouTube, as part of the promotional effort of their Vocaloid products. The important role Nico Nico Douga has played in promoting the Vocaloids also sparked interest in the software and Kentaro Miura, the artist of Gakupo's mascot design, had offered his services for free because of his love for the website.

In September 2009, three figurines based on the derivative character "Hachune Miku" were launched in a rocket from the United States state of Nevada's Black Rock Desert, though it did not reach outer space. In late November 2009, a petition was launched in order to get a custom made Hatsune Miku aluminum plate (8 cm x 12 cm, 3.1" x 4.7") made that would be used as a balancing weight for the Japanese Venus space probe Akatsuki. Started by Hatsune Miku fan Sumio Morioka that goes by chodenzi-P, this project received the backing of Dr. Seiichi Sakamoto of the Japan Aerospace Exploration Agency (JAXA). The website of the petition written in Japanese was translated into other languages such as English, Russian, Chinese and Korean, and, the petition exceeded the needed 10,000 signatures necessary to have the plates made on December 22, 2009. On May 21, 2010 at 06:58:22 (JST), Akatsuki was launched on the rocket H-IIA 202 Flight 17 from the Japanese spaceport Tanegashima Space Center, having three plates depicting Hatsune Miku.

The Vocaloid software has also had a great influence on the character Black Rock Shooter, which looks like Hatsune Miku but is not linked to her by design. The character was made famous by the song "Black Rock Shooter", and a number of figurines have been made. An original video animation made by Ordet was streamed for free as part of a promotional campaign running from June 25 to August 31, 2010. The virtual idols "Meaw" have also been released aimed at the Vocaloid culture. The twin Thai virtual idols released two singles, "Meaw Left ver." and "Meaw Right ver.", sung in Japanese.

A cafe for one day only was opened in Tokyo based on Hatsune Miku on August 31, 2010. A second event was arranged for all Japanese Vocaloids. "Snow Miku" was also featured on an event as a part of the 62nd Sapporo Snow Festival in February 2011. A Vocaloid-themed TV show on the Japanese Vocaloids called Vocalo Revolution began airing on Kyoto Broadcasting System on January 3, 2011. The show is part of a bid to make the Vocaloid culture more widely accepted and features a mascot known as "Cul", also mascot of the "Cul Project". The show's first success story is a joint collaboration between Vocalo Revolution and the school fashion line "Cecil McBee" Music x Fashion x Dance. Piapro also held a competition with famous fashion brands with the winners seeing their Lolita-based designs reproduced for sale by the company Putumayo. A radio station set up a 1 hour program containing nothing but Vocaloid-based music.

The Vocaloid software had a great influence on the development of the freeware Utau. Several products were produced for the Macne series (Mac音シリーズ) for intended use for the programs Reason 4 and GarageBand. These products were sold by Act2 and by converting their file format, were able to also work with the Utau program. The program Maidloid, developed for the character Acme Iku (阿久女イク^?), was also developed, which works in a similar way to Vocaloid, except produces erotic sounds rather than an actual singing voice. Other than Vocaloid, AH Software also developed Tsukuyomi Ai and Shouta for the software Voiceroid, and the sale of their Vocaloids gave AH software the chance to promote Voiceroid at the same time. The software is aimed for speaking rather than singing. Both AH Software's Vocaloids and Voiceroids went on sale on December 4, 2009. Crypton Future Media has been reported to openly welcome these additional software developments as it expands the market for synthesized voices.

During the events of the 2011 Tōhoku earthquake and tsunami, a number of Vocaloid related donation drives were produced. Crypton Future Media joined several other companies in a donation drive, with money spent on the sales of music from Crypton Future Media's KarenT label being donated to the Japanese Red Cross. In addition, a special Nendoroid of Hatsune Miku, Nendoroid Hatsune Miku: Support ver., was announced with a donation of 1,000 yen per sale to the Japanese Red Cross.

Vocaloid

Vocaloid

Vocaloid 2 Editor (English version)
Developer(s)	Yamaha Corporation
Initial release	January 2004
Stable release	Vocaloid 3
Development status	Active
Operating system	Windows XP / Vista / 7 Apple iOS (iVocaloid, Japan Only)
Available in	English, Japanese, Spanish, Korean, Chinese
Type	Musical Synthesizer Application
License	proprietary
Website	www.vocaloid.com/en/

Vocaloid (ボーカロイド Bōkaroido) is a singing voice synthesizer. Its signal processing part was developed through a joint research project led by Kenmochi Hideki at the Pompeu Fabra University in Spain in 2000 and originally was not intended to be a full commercial project. Backed by theYamaha Corporation, it developed the software into the commercial product "Vocaloid." The software enables users to synthesize singing by typing in lyrics and melody. It uses synthesizing technology with specially recorded vocals of voice actors or singers. To create a song, the user must input the melody and lyrics. A piano roll type interface is used to input the melody and the lyrics can be entered on each note. The software can change the stress of the pronunciations, add effects such as vibrato, or change the dynamics and tone of the voice. Each Vocaloid is sold as "a singer in a box" designed to act as a replacement for an actual singer. The software was originally only available in English starting with the first Vocaloids Leon and Lola, and Japanese with Meiko, but Vocaloid 3 has added support for Spanish for the new Spanish Vocaloids Bruno and Clara,Chinese for Luo Tinayi and Korean for SeeU.

The software is intended for professional musicians as well as light computer music users and has so far sold on the idea that the only limits are the users' own skills. Japanese musical groups Livetune of Victor Entertainment and Supercell of Sony Music Entertainment Japan have released their songs featuring Vocaloid as vocals. Japanese record label Exit Tunes of Quake Inc. also have released compilation albums featuring Vocaloids. Artists such as Mike Oldfield have also used Vocaloids within their work for back up singer vocals and sound samples.

Technology

The Vocaloid singing synthesizer technology is categorized as concatenative synthesis, which splices and processes vocal fragments extracted from human singing voices in the frequency domain. In singing synthesis, the system produces realistic voices by adding information of vocal expressions like vibrato to score information. The Vocaloid synthesis technology was initially called "Frequency-domain Singing Articulation Splicing and Shaping" (周波数ドメイン歌唱アーティキュレーション接続法 Shūhasū-domain Kashō Articulation Setsuzoku-hō), although Yamaha no longer uses this name on its websites. "Singing Articulation" is explained as "vocal expressions" such as vibrato and vocal fragments necessary for singing. The Vocaloid and Vocaloid 2 synthesis engines are designed for singing, not reading text aloud, though software such as Vocaloid-flex and Voiceroid have been developed for that. They cannot naturally replicate singing expressions like hoarse voices or shouts, but Appends are made to create different tones such as "whisper" and "power".

[edit]System architecture

The main parts of the Vocaloid 2 system are the Score Editor (Vocaloid 2 Editor), the Singer Library, and the Synthesis Engine. The Synthesis Engine receives score information from the Score Editor, selects appropriate samples from the Singer Library, and concatenates them to output synthesized voices. There is basically no difference in the Score Editor and the Synthesis Engine provided by Yamaha among different Vocaloid 2 products. If a Vocaloid 2 product is already installed, the user can enable another Vocaloid 2 product by adding its library. The system supports two languages, Japanese and English, although other languages may be optional in the future.^[1] It works standalone (playback and export to WAV) and as a ReWire application or VSTiaccessible from DAW.

[edit]Score Editor

The Score Editor is a piano roll style editor to input notes, lyrics, and some expressions. For a Japanese Singer Library, the user can input gojūon lyrics in hiragana, katakana or romaji writing. For an English library, the Editor automatically converts the lyrics into the IPA phonetic symbols using the built-in pronunciation dictionary.^[2] The user can directly edit the phonetic symbols of unregistered words. A Japanese library and an English library differ in the lyrics input method, but share the same platform. Therefore, the Japanese editor can load an English library and vice versa. As mentioned above, the lyrics input method is library-dependent, and so the Japanese and English editors differ only in the menus. The Score Editor offers various parameters to add expressions to singing voices. The user is supposed to optimize these parameters that best fit the synthesized tune when creating voices. This editor supports ReWire and can be synchronized with DAW. Real-time "playback" of songs with predefined lyrics using a MIDI keyboard is also supported.

[edit]Singer Library

Each Vocaloid license develops the Singer Library, or a database of vocal fragments sampled from real people. The database must have all possible combinations of phonemes of the target language, including diphones (a chain of two different phonemes) and sustained vowels, as well as polyphones with more than two phonemes if necessary. For example, the voice corresponding to the word "sing" ([sIN]) can be synthesized by concatenating the sequence of diphones "#-s, s-I, I-N, N-#" (# indicating a voiceless phoneme) with the sustained vowel ī. The Vocaloid system changes the pitch of these fragments so that it fits the melody. In order to get more natural sounds, three or four different pitch ranges are required to be stored into the library. Japanese requires 500 diphones per pitch, whereas English requires 2,500.^[ Japanese has fewer diphones because it has fewer phonemes and most syllabic sounds are open syllables ending in a vowel. In Japanese, there are basically three patterns of diphones containing a consonant: voiceless-consonant, vowel-consonant, and consonant-vowel. On the other hand, English has many closed syllables ending in a consonant, and consonant-consonant and consonant-voiceless diphones as well. Thus, more diphones need to be recorded into an English library than into a Japanese one. Due to this linguistic difference, a Japanese library is not suitable for singing in English.

[edit]Synthesis Engine

The Synthesis Engine receives score information contained in dedicated MIDI messages called Vocaloid MIDI sent by the Score Editor, adjusts pitch and timbre of the selected samples in frequency domain, and splices them to synthesize singing voices. When Vocaloid runs as VSTi accessible from DAW, the bundled VST plug-in bypasses the Score Editor and directly sends these messages to the Synthesis Engine.^[8]

Timing adjustment: In singing voices, the consonant onset of a syllable is uttered before the vowel onset is uttered. The starting position of a note called "Note-On" must be the same as that of the vowel onset, not the start of the syllable. Vocaloid keeps the "synthesized score" in memory to adjust sample timing so that the vowel onset should be strictly on the "Note-On" position. No timing adjustment would result in delay.

Pitch conversion: Since the samples are recorded in different pitches, pitch conversion is required when concatenating the samples. The engine calculates a desired pitch from the notes, attack time, and vibrato parameters, and then selects the necessary samples from the library.

Timbre manipulation: The engine smooths the timbre around the junction of the samples. The timbre of a sustained vowel is generated by interpolating spectral envelopes of the surrounding samples. For example, when concatenating a sequence of diphones "s-e, e, e-t" of the English word "set", the spectral envelope of a sustained ē at each frame is generated by interpolating ē in the end of "s-e" and ē in the beginning of "e-t".

Transforms: After pitch conversion and timbre manipulation, the engine does transforms such as Inverse Fast Fourier transform (IFFT) to output synthesized voices.