WEKO3
アイテム
{"_buckets": {"deposit": "214381e2-2f7f-4975-a3b4-c9f4a66efa8f"}, "_deposit": {"created_by": 13, "id": "9679", "owners": [13], "pid": {"revision_id": 0, "type": "depid", "value": "9679"}, "status": "published"}, "_oai": {"id": "oai:uec.repo.nii.ac.jp:00009679", "sets": ["6"]}, "author_link": ["26104", "26103", "26102"], "item_10001_biblio_info_7": {"attribute_name": "書誌情報", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "2020-06-01", "bibliographicIssueDateType": "Issued"}, "bibliographicIssueNumber": "6", "bibliographicPageEnd": "835", "bibliographicPageStart": "829", "bibliographicVolumeNumber": "E103.A", "bibliographic_titles": [{}, {"bibliographic_title": "IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences", "bibliographic_titleLang": "en"}]}]}, "item_10001_description_5": {"attribute_name": "抄録", "attribute_value_mlt": [{"subitem_description": "This paper proposes two variants of improved Compression by Substring Enumeration (CSE) with a finite alphabet. In previous studies on CSE, an encoder utilizes inequalities which evaluate the number of occurrences of a substring or a minimal forbidden word (MFW) to be encoded. The inequalities are derived from a contingency table including the number of occurrences of a substring or an MFW. Moreover, codeword length of a substring and an MFW grows with the difference between the upper and lower bounds deduced from the inequalities, however the lower bound is not tight. Therefore, we derive a new tight lower bound based on the contingency table and consequently propose a new CSE algorithm using the new inequality. We also propose a new encoding order of substrings and MFWs based on a sorted contingency table such that both its row and column marginal total are sorted in descending order instead of a lexicographical order used in previous studies. We then propose a new CSE algorithm which is the first proposed CSE algorithm using the new encoding order. Experimental results show that compression ratios of all files of the Calgary corpus in the proposed algorithms are better than those of a previous study on CSE with a finite alphabet. Moreover, compression ratios under the second proposed CSE get better than or equal to that under a well-known compressor for 11 files amongst 14 files in the corpus.", "subitem_description_type": "Abstract"}]}, "item_10001_publisher_8": {"attribute_name": "出版者", "attribute_value_mlt": [{"subitem_publisher": "The Institute of Electronics, Information and Communication Engineers"}]}, "item_10001_relation_14": {"attribute_name": "DOI", "attribute_value_mlt": [{"subitem_relation_type": "isIdenticalTo", "subitem_relation_type_id": {"subitem_relation_type_id_text": "10.1587/transfun.2019EAP1063", "subitem_relation_type_select": "DOI"}}]}, "item_10001_relation_17": {"attribute_name": "関連サイト", "attribute_value_mlt": [{"subitem_relation_type_id": {"subitem_relation_type_id_text": "https://search.ieice.org/index.html", "subitem_relation_type_select": "URI"}}]}, "item_10001_rights_15": {"attribute_name": "権利", "attribute_value_mlt": [{"subitem_rights": "Copyright © 2020 IEICE"}]}, "item_10001_source_id_9": {"attribute_name": "ISSN", "attribute_value_mlt": [{"subitem_source_identifier": "09168508", "subitem_source_identifier_type": "ISSN"}]}, "item_10001_version_type_20": {"attribute_name": "著者版フラグ", "attribute_value_mlt": [{"subitem_version_resource": "http://purl.org/coar/version/c_970fb48d4fbd8a85", "subitem_version_type": "VoR"}]}, "item_creator": {"attribute_name": "著者", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "OTA, Takahiro", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "26102", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "MORITA, Hiroyoshi", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "26103", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "MANADA, Akiko", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "26104", "nameIdentifierScheme": "WEKO"}]}]}, "item_files": {"attribute_name": "ファイル情報", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2020-11-19"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "e103-a_6_829.pdf", "filesize": [{"value": "806.8 kB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 806800.0, "url": {"label": "e103-a_6_829", "url": "https://uec.repo.nii.ac.jp/record/9679/files/e103-a_6_829.pdf"}, "version_id": "3515634e-4ff6-4d3a-bcd6-15d099014540"}]}, "item_keyword": {"attribute_name": "キーワード", "attribute_value_mlt": [{"subitem_subject": "CSE", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "sorting", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "contingency table", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "lossless data compression", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}]}, "item_language": {"attribute_name": "言語", "attribute_value_mlt": [{"subitem_language": "eng"}]}, "item_resource_type": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"resourcetype": "journal article", "resourceuri": "http://purl.org/coar/resource_type/c_6501"}]}, "item_title": "Compression by Substring Enumeration Using Sorted Contingency Tables", "item_titles": {"attribute_name": "タイトル", "attribute_value_mlt": [{"subitem_title": "Compression by Substring Enumeration Using Sorted Contingency Tables", "subitem_title_language": "en"}]}, "item_type_id": "10001", "owner": "13", "path": ["6"], "permalink_uri": "https://uec.repo.nii.ac.jp/records/9679", "pubdate": {"attribute_name": "公開日", "attribute_value": "2020-11-19"}, "publish_date": "2020-11-19", "publish_status": "0", "recid": "9679", "relation": {}, "relation_version_is_last": true, "title": ["Compression by Substring Enumeration Using Sorted Contingency Tables"], "weko_shared_id": -1}
Compression by Substring Enumeration Using Sorted Contingency Tables
https://uec.repo.nii.ac.jp/records/9679
https://uec.repo.nii.ac.jp/records/9679af36c63d-58b8-4a80-90ca-716147bdc87a
名前 / ファイル | ライセンス | アクション |
---|---|---|
e103-a_6_829 (806.8 kB)
|
|
Item type | 学術雑誌論文 / Journal Article(1) | |||||
---|---|---|---|---|---|---|
公開日 | 2020-11-19 | |||||
タイトル | ||||||
言語 | en | |||||
タイトル | Compression by Substring Enumeration Using Sorted Contingency Tables | |||||
言語 | ||||||
言語 | eng | |||||
キーワード | ||||||
言語 | en | |||||
主題 | CSE | |||||
キーワード | ||||||
言語 | en | |||||
主題 | sorting | |||||
キーワード | ||||||
言語 | en | |||||
主題 | contingency table | |||||
キーワード | ||||||
言語 | en | |||||
主題 | lossless data compression | |||||
資源タイプ | ||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_6501 | |||||
資源タイプ | journal article | |||||
著者 |
OTA, Takahiro
× OTA, Takahiro× MORITA, Hiroyoshi× MANADA, Akiko |
|||||
抄録 | ||||||
内容記述タイプ | Abstract | |||||
内容記述 | This paper proposes two variants of improved Compression by Substring Enumeration (CSE) with a finite alphabet. In previous studies on CSE, an encoder utilizes inequalities which evaluate the number of occurrences of a substring or a minimal forbidden word (MFW) to be encoded. The inequalities are derived from a contingency table including the number of occurrences of a substring or an MFW. Moreover, codeword length of a substring and an MFW grows with the difference between the upper and lower bounds deduced from the inequalities, however the lower bound is not tight. Therefore, we derive a new tight lower bound based on the contingency table and consequently propose a new CSE algorithm using the new inequality. We also propose a new encoding order of substrings and MFWs based on a sorted contingency table such that both its row and column marginal total are sorted in descending order instead of a lexicographical order used in previous studies. We then propose a new CSE algorithm which is the first proposed CSE algorithm using the new encoding order. Experimental results show that compression ratios of all files of the Calgary corpus in the proposed algorithms are better than those of a previous study on CSE with a finite alphabet. Moreover, compression ratios under the second proposed CSE get better than or equal to that under a well-known compressor for 11 files amongst 14 files in the corpus. | |||||
書誌情報 |
en : IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 巻 E103.A, 号 6, p. 829-835, 発行日 2020-06-01 |
|||||
出版者 | ||||||
出版者 | The Institute of Electronics, Information and Communication Engineers | |||||
ISSN | ||||||
収録物識別子タイプ | ISSN | |||||
収録物識別子 | 09168508 | |||||
DOI | ||||||
関連タイプ | isIdenticalTo | |||||
識別子タイプ | DOI | |||||
関連識別子 | 10.1587/transfun.2019EAP1063 | |||||
権利 | ||||||
権利情報 | Copyright © 2020 IEICE | |||||
関連サイト | ||||||
識別子タイプ | URI | |||||
関連識別子 | https://search.ieice.org/index.html | |||||
著者版フラグ | ||||||
出版タイプ | VoR | |||||
出版タイプResource | http://purl.org/coar/version/c_970fb48d4fbd8a85 |