Leveraging Synthesized Data for Guitar Tablature Transcription

Yongyi Zang*, Yi Zhong* (Equal contribution), Frank Cwitkowitz, Zhiyao Duan

Accepted at ICASSP 2024

Demo Renderings from SynthTab

We select several rendering excerpts from SynthTab for a quick overview of the rendering quality.

Excerpt from ZZ Top - La Grange, Rendered using Semi-Hollow Timbre, Played with Pick

Excerpt from Skyfire - By God Forsaken, Rendered using Martin Acoustic Guitar Timbre, Played with Fingerpicking

Excerpt from Pachelbel, Johann - Canon in D Major, Rendered using Taylor Acoustic Guitar Timbre, Played with Fingerpicking

We created a large-scale synthesized guitar tablature dataset to address the low-resource problem in guitar tablature transcription.

Existing guitar tablature datasets are limited in size, because human playing and annotation, even with automatic tools like hexaphonic pickups, does not scale well.

As a result, current guitar tablature models overfit strongly on each dataset.

Tab. F1 (%) Train on
Test on GuitarSet 78.3 18.9 40.7
IDMT 67.1 64.4 20.5
EGDB 53.3 27.7 71.0

Results of TabCNN baseline model trained on each dataset, then evaluate on all three datasets.

Tab. F1 (%) Train on
Test on GuitarSet 79.1 16.9 35.8
IDMT 61.3 61.5 27.1
EGDB 55.8 27.9 72.4

Results of TabCNN+ model trained on each dataset, then evaluate on all three datasets. TabCNN+ is defined as TabCNN with four times more filters per convolutional layer.

When pre-train on SynthTab first then fine-tune on each dataset, we see substantial improvement in both same-dataset and cross-dataset settings.

We achieved this by creating a diverse, accurate and realistic rendering pipeline of guitar tablature.

To promote diversity, we used a portion of DadaGP, which contains 26,181 songs across 739 musical genres. DadaGP stores tablature in GuitarPro formats; we first convert them to JAMS, then to MIDI. This pipeline is also open-sourced.

To achieve string accuracy, we categorize guitar note samples by string and strictly follow the string specifications in each tablature during synthesis. This is made possible by our MIDI rendering pipeline that allows for string-specific samples through specific MIDI channels.

To improve realism, we integrate humanization effects like vibrato into the synthesis process using MIDI CC information perturbation.

In total, SynthTab contains around 6,700 hours of audio across 15,211 tracks and 23 timbres.

The acoustic portion is synthesized with 4 guitars, 3 of them contains both fingerpicking and picks, making up in total of 7 timbres;

The electric portion is synthesized 7 different electric guitar with varying pickup positions, generating 16 different timbres.

Instrument Type Inst. # Instrument Name # Tracks # Rendered Audio Total Hours
Acoustic Guitar (7 timbres) 24 Acoustic Nylon Guitar 5501 (36.16%) 38507 (26.10%) 1510
25 Acoustic Steel Guitar 5149 (33.85%) 36043 (24.43%) 1690
Electric Guitar (16 timbres) 26 Electric Clean Guitar 2989 (19.65%) 47824 (32.42%) 1162
27 Electric Jazz Guitar 1572 (10.33%) 25152 (17.05%) 2338

SynthTab track distribution, grouped by its original MIDI instrument in tablature information.