WEKO3
アイテム
{"_buckets": {"deposit": "342bbac3-3b86-46d3-980b-4f09489ce455"}, "_deposit": {"created_by": 13, "id": "8866", "owners": [13], "pid": {"revision_id": 0, "type": "depid", "value": "8866"}, "status": "published"}, "_oai": {"id": "oai:uec.repo.nii.ac.jp:00008866", "sets": ["6"]}, "author_link": ["24102", "24101", "24099", "24100"], "item_10001_biblio_info_7": {"attribute_name": "書誌情報", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "2019-02", "bibliographicIssueDateType": "Issued"}, "bibliographicIssueNumber": "1", "bibliographicPageStart": "20", "bibliographicVolumeNumber": "15", "bibliographic_titles": [{}, {"bibliographic_title": "ACM Transactions on Multimedia Computing Communications and Applications", "bibliographic_titleLang": "en"}]}]}, "item_10001_description_6": {"attribute_name": "内容記述", "attribute_value_mlt": [{"subitem_description": "Deep cross-modal learning has successfully demonstrated excellent performance in cross-modal multimedia retrieval, with the aim of learning joint representations between different data modalities. Unfortunately, little research focuses on cross-modal correlation learning where temporal structures of different data modalities such as audio and lyrics should be taken into account. Stemming from the characteristic of temporal structures of music in nature, we are motivated to learn the deep sequential correlation between audio and lyrics. In this work, we propose a deep cross-modal correlation learning architecture involving two-branch deep neural networks for audio modality and text modality (lyrics). Data in different modalities are converted to the same canonical space where inter modal canonical correlation analysis is utilized as an objective function to calculate the similarity of temporal structures. This is the first study that uses deep architectures for learning the temporal correlation between audio and lyrics. A pre-trained Doc2Vec model followed by fully-connected layers is used to represent lyrics. Two significant contributions are made in the audio branch, as follows: i) We propose an end-to-end network to learn cross-modal correlation between audio and lyrics, where feature extraction and correlation learning are simultaneously performed and joint representation is learned by considering temporal structures. ii) As for feature extraction, we further represent an audio signal by a short sequence of local summaries (VGG16 features) and apply a recurrent neural network to compute a compact feature that better learns temporal structures of music audio. Experimental results, using audio to retrieve lyrics or using lyrics to retrieve audio, verify the effectiveness of the proposed deep correlation learning architectures in cross-modal music retrieval.", "subitem_description_type": "Other"}]}, "item_10001_publisher_8": {"attribute_name": "出版者", "attribute_value_mlt": [{"subitem_publisher": "ACM "}]}, "item_10001_relation_17": {"attribute_name": "関連サイト", "attribute_value_mlt": [{"subitem_relation_type_id": {"subitem_relation_type_id_text": "https://doi.org/10.1145/3281746", "subitem_relation_type_select": "DOI"}}]}, "item_10001_rights_15": {"attribute_name": "権利", "attribute_value_mlt": [{"subitem_rights": "© ACM, 2019. This is the author\u0027s version of the work. It is posted here for your personal use. Not for redistribution. "}]}, "item_10001_source_id_9": {"attribute_name": "ISSN", "attribute_value_mlt": [{"subitem_source_identifier": "1551-6857", "subitem_source_identifier_type": "ISSN"}]}, "item_10001_version_type_20": {"attribute_name": "著者版フラグ", "attribute_value_mlt": [{"subitem_version_resource": "http://purl.org/coar/version/c_ab4af688f83e57aa", "subitem_version_type": "AM"}]}, "item_creator": {"attribute_name": "著者", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "Yu, Yi", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "24099", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Tang, Suhua", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "24100", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Raposo, Francisco", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "24101", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Chen, Lei", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "24102", "nameIdentifierScheme": "WEKO"}]}]}, "item_files": {"attribute_name": "ファイル情報", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2019-02-21"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "TOMM.PDF", "filesize": [{"value": "1.5 MB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 1500000.0, "url": {"label": "TOMM", "url": "https://uec.repo.nii.ac.jp/record/8866/files/TOMM.PDF"}, "version_id": "8e89208f-ba0d-4f1c-9790-4d93b97173f2"}]}, "item_keyword": {"attribute_name": "キーワード", "attribute_value_mlt": [{"subitem_subject": "Convolutional neural networks", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "Deep cross-modal models", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "Correlation learning between audio and lyrics", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "Cross-modal music retrieval", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "Music knowledge discovery", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}]}, "item_language": {"attribute_name": "言語", "attribute_value_mlt": [{"subitem_language": "eng"}]}, "item_resource_type": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"resourcetype": "journal article", "resourceuri": "http://purl.org/coar/resource_type/c_6501"}]}, "item_title": "Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval", "item_titles": {"attribute_name": "タイトル", "attribute_value_mlt": [{"subitem_title": "Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval", "subitem_title_language": "en"}]}, "item_type_id": "10001", "owner": "13", "path": ["6"], "permalink_uri": "https://uec.repo.nii.ac.jp/records/8866", "pubdate": {"attribute_name": "公開日", "attribute_value": "2019-01-15"}, "publish_date": "2019-01-15", "publish_status": "0", "recid": "8866", "relation": {}, "relation_version_is_last": true, "title": ["Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval"], "weko_shared_id": -1}
Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval
https://uec.repo.nii.ac.jp/records/8866
https://uec.repo.nii.ac.jp/records/8866452e5d54-e0bb-4cc2-92d3-d44a085068f2
名前 / ファイル | ライセンス | アクション |
---|---|---|
TOMM (1.5 MB)
|
|
Item type | 学術雑誌論文 / Journal Article(1) | |||||
---|---|---|---|---|---|---|
公開日 | 2019-01-15 | |||||
タイトル | ||||||
言語 | en | |||||
タイトル | Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval | |||||
言語 | ||||||
言語 | eng | |||||
キーワード | ||||||
言語 | en | |||||
主題 | Convolutional neural networks | |||||
キーワード | ||||||
言語 | en | |||||
主題 | Deep cross-modal models | |||||
キーワード | ||||||
言語 | en | |||||
主題 | Correlation learning between audio and lyrics | |||||
キーワード | ||||||
言語 | en | |||||
主題 | Cross-modal music retrieval | |||||
キーワード | ||||||
言語 | en | |||||
主題 | Music knowledge discovery | |||||
資源タイプ | ||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_6501 | |||||
資源タイプ | journal article | |||||
著者 |
Yu, Yi
× Yu, Yi× Tang, Suhua× Raposo, Francisco× Chen, Lei |
|||||
内容記述 | ||||||
内容記述タイプ | Other | |||||
内容記述 | Deep cross-modal learning has successfully demonstrated excellent performance in cross-modal multimedia retrieval, with the aim of learning joint representations between different data modalities. Unfortunately, little research focuses on cross-modal correlation learning where temporal structures of different data modalities such as audio and lyrics should be taken into account. Stemming from the characteristic of temporal structures of music in nature, we are motivated to learn the deep sequential correlation between audio and lyrics. In this work, we propose a deep cross-modal correlation learning architecture involving two-branch deep neural networks for audio modality and text modality (lyrics). Data in different modalities are converted to the same canonical space where inter modal canonical correlation analysis is utilized as an objective function to calculate the similarity of temporal structures. This is the first study that uses deep architectures for learning the temporal correlation between audio and lyrics. A pre-trained Doc2Vec model followed by fully-connected layers is used to represent lyrics. Two significant contributions are made in the audio branch, as follows: i) We propose an end-to-end network to learn cross-modal correlation between audio and lyrics, where feature extraction and correlation learning are simultaneously performed and joint representation is learned by considering temporal structures. ii) As for feature extraction, we further represent an audio signal by a short sequence of local summaries (VGG16 features) and apply a recurrent neural network to compute a compact feature that better learns temporal structures of music audio. Experimental results, using audio to retrieve lyrics or using lyrics to retrieve audio, verify the effectiveness of the proposed deep correlation learning architectures in cross-modal music retrieval. | |||||
書誌情報 |
en : ACM Transactions on Multimedia Computing Communications and Applications 巻 15, 号 1, p. 20, 発行日 2019-02 |
|||||
出版者 | ||||||
出版者 | ACM | |||||
ISSN | ||||||
収録物識別子タイプ | ISSN | |||||
収録物識別子 | 1551-6857 | |||||
権利 | ||||||
権利情報 | © ACM, 2019. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. | |||||
関連サイト | ||||||
識別子タイプ | DOI | |||||
関連識別子 | https://doi.org/10.1145/3281746 | |||||
著者版フラグ | ||||||
出版タイプ | AM | |||||
出版タイプResource | http://purl.org/coar/version/c_ab4af688f83e57aa |