Demo Renderings from SynthTab
We select several rendering excerpts from SynthTab for a quick overview of the rendering quality.
SynthTab
Leveraging Synthesized Data for Guitar Tablature Transcription
Yongyi Zang*, Yi Zhong* (Equal contribution), Frank Cwitkowitz, Zhiyao Duan
yongyi.zang@rochester.edu, yi.zhong@rutgers.edu, fcwitkow@ur.rochester.edu, zhiyao.duan@rochester.edu
Accepted at ICASSP 2024
We select several rendering excerpts from SynthTab for a quick overview of the rendering quality.
Excerpt from ZZ Top - La Grange, Rendered using Semi-Hollow Timbre, Played with Pick
Excerpt from Skyfire - By God Forsaken, Rendered using Martin Acoustic Guitar Timbre, Played with Fingerpicking
Excerpt from Pachelbel, Johann - Canon in D Major, Rendered using Taylor Acoustic Guitar Timbre, Played with Fingerpicking
Existing guitar tablature datasets are limited in size, because human playing and annotation, even with automatic tools like hexaphonic pickups, does not scale well.
As a result, current guitar tablature models overfit strongly on each dataset.
Tab. F1 (%) | Train on | |||
---|---|---|---|---|
GuitarSet | IDMT | EGDB | ||
Test on | GuitarSet | 78.3 | 18.9 | 40.7 |
IDMT | 67.1 | 64.4 | 20.5 | |
EGDB | 53.3 | 27.7 | 71.0 |
Results of TabCNN baseline model trained on each dataset, then evaluate on all three datasets.
Tab. F1 (%) | Train on | |||
---|---|---|---|---|
GuitarSet | IDMT | EGDB | ||
Test on | GuitarSet | 79.1 | 16.9 | 35.8 |
IDMT | 61.3 | 61.5 | 27.1 | |
EGDB | 55.8 | 27.9 | 72.4 |
Results of TabCNN+ model trained on each dataset, then evaluate on all three datasets. TabCNN+ is defined as TabCNN with four times more filters per convolutional layer.
When pre-train on SynthTab first then fine-tune on each dataset, we see substantial improvement in both same-dataset and cross-dataset settings.
To promote diversity, we used a portion of DadaGP, which contains 26,181 songs across 739 musical genres. DadaGP stores tablature in GuitarPro formats; we first convert them to JAMS, then to MIDI. This pipeline is also open-sourced.
To achieve string accuracy, we categorize guitar note samples by string and strictly follow the string specifications in each tablature during synthesis. This is made possible by our MIDI rendering pipeline that allows for string-specific samples through specific MIDI channels.
To improve realism, we integrate humanization effects like vibrato into the synthesis process using MIDI CC information perturbation.
The acoustic portion is synthesized with 4 guitars, 3 of them contains both fingerpicking and picks, making up in total of 7 timbres;
The electric portion is synthesized 7 different electric guitar with varying pickup positions, generating 16 different timbres.
Instrument Type | Inst. # | Instrument Name | # Tracks | # Rendered Audio | Total Hours |
---|---|---|---|---|---|
Acoustic Guitar (7 timbres) | 24 | Acoustic Nylon Guitar | 5501 (36.16%) | 38507 (26.10%) | 1510 |
25 | Acoustic Steel Guitar | 5149 (33.85%) | 36043 (24.43%) | 1690 | |
Electric Guitar (16 timbres) | 26 | Electric Clean Guitar | 2989 (19.65%) | 47824 (32.42%) | 1162 |
27 | Electric Jazz Guitar | 1572 (10.33%) | 25152 (17.05%) | 2338 |
SynthTab track distribution, grouped by its original MIDI instrument in tablature information.