SynthTab: Leveraging Synthesized Data for Guitar Tablature Transcription

SynthTab

Leveraging Synthesized Data for Guitar Tablature Transcription

Yongyi Zang*, Yi Zhong* (Equal contribution), Frank Cwitkowitz, Zhiyao Duan

yongyi.zang@rochester.edu, yi.zhong@rutgers.edu, fcwitkow@ur.rochester.edu, zhiyao.duan@rochester.edu

Accepted at ICASSP 2024

Demo Renderings from SynthTab

We select several rendering excerpts from SynthTab for a quick overview of the rendering quality.

Excerpt from ZZ Top - La Grange, Rendered using Semi-Hollow Timbre, Played with Pick

Excerpt from Skyfire - By God Forsaken, Rendered using Martin Acoustic Guitar Timbre, Played with Fingerpicking

Excerpt from Pachelbel, Johann - Canon in D Major, Rendered using Taylor Acoustic Guitar Timbre, Played with Fingerpicking

We created a large-scale synthesized guitar tablature dataset to address the low-resource problem in guitar tablature transcription.

Existing guitar tablature datasets are limited in size, because human playing and annotation, even with automatic tools like hexaphonic pickups, does not scale well.

As a result, current guitar tablature models overfit strongly on each dataset.

Tab. F1 (%)		Train on
Tab. F1 (%)		GuitarSet	IDMT	EGDB
Test on	GuitarSet	78.3	18.9	40.7
	IDMT	67.1	64.4	20.5
	EGDB	53.3	27.7	71.0

Results of TabCNN baseline model trained on each dataset, then evaluate on all three datasets.

Tab. F1 (%)		Train on
Tab. F1 (%)		GuitarSet	IDMT	EGDB
Test on	GuitarSet	79.1	16.9	35.8
	IDMT	61.3	61.5	27.1
	EGDB	55.8	27.9	72.4

Results of TabCNN+ model trained on each dataset, then evaluate on all three datasets. TabCNN+ is defined as TabCNN with four times more filters per convolutional layer.

When pre-train on SynthTab first then fine-tune on each dataset, we see substantial improvement in both same-dataset and cross-dataset settings.

We achieved this by creating a diverse, accurate and realistic rendering pipeline of guitar tablature.

To promote diversity, we used a portion of DadaGP, which contains 26,181 songs across 739 musical genres. DadaGP stores tablature in GuitarPro formats; we first convert them to JAMS, then to MIDI. This pipeline is also open-sourced.

To achieve string accuracy, we categorize guitar note samples by string and strictly follow the string specifications in each tablature during synthesis. This is made possible by our MIDI rendering pipeline that allows for string-specific samples through specific MIDI channels.

To improve realism, we integrate humanization effects like vibrato into the synthesis process using MIDI CC information perturbation.

In total, SynthTab contains around 6,700 hours of audio across 15,211 tracks and 23 timbres.

The acoustic portion is synthesized with 4 guitars, 3 of them contains both fingerpicking and picks, making up in total of 7 timbres;

The electric portion is synthesized 7 different electric guitar with varying pickup positions, generating 16 different timbres.

Instrument Type	Inst. #	Instrument Name	# Tracks	# Rendered Audio	Total Hours
Acoustic Guitar (7 timbres)	24	Acoustic Nylon Guitar	5501 (36.16%)	38507 (26.10%)	1510
Acoustic Guitar (7 timbres)	25	Acoustic Steel Guitar	5149 (33.85%)	36043 (24.43%)	1690
Electric Guitar (16 timbres)	26	Electric Clean Guitar	2989 (19.65%)	47824 (32.42%)	1162
Electric Guitar (16 timbres)	27	Electric Jazz Guitar	1572 (10.33%)	25152 (17.05%)	2338

SynthTab track distribution, grouped by its original MIDI instrument in tablature information.