Compression by Substring Enumeration Using Sorted Contingency Tables

OTA, Takahiro; MORITA, Hiroyoshi; MANADA, Akiko

doi:10.1587/transfun.2019EAP1063

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

{"_buckets": {"deposit": "214381e2-2f7f-4975-a3b4-c9f4a66efa8f"}, "_deposit": {"created_by": 13, "id": "9679", "owners": [13], "pid": {"revision_id": 0, "type": "depid", "value": "9679"}, "status": "published"}, "_oai": {"id": "oai:uec.repo.nii.ac.jp:00009679", "sets": ["6"]}, "author_link": ["26104", "26103", "26102"], "item_10001_biblio_info_7": {"attribute_name": "書誌情報", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "2020-06-01", "bibliographicIssueDateType": "Issued"}, "bibliographicIssueNumber": "6", "bibliographicPageEnd": "835", "bibliographicPageStart": "829", "bibliographicVolumeNumber": "E103.A", "bibliographic_titles": [{}, {"bibliographic_title": "IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences", "bibliographic_titleLang": "en"}]}]}, "item_10001_description_5": {"attribute_name": "抄録", "attribute_value_mlt": [{"subitem_description": "This paper proposes two variants of improved Compression by Substring Enumeration (CSE) with a finite alphabet. In previous studies on CSE, an encoder utilizes inequalities which evaluate the number of occurrences of a substring or a minimal forbidden word (MFW) to be encoded. The inequalities are derived from a contingency table including the number of occurrences of a substring or an MFW. Moreover, codeword length of a substring and an MFW grows with the difference between the upper and lower bounds deduced from the inequalities, however the lower bound is not tight. Therefore, we derive a new tight lower bound based on the contingency table and consequently propose a new CSE algorithm using the new inequality. We also propose a new encoding order of substrings and MFWs based on a sorted contingency table such that both its row and column marginal total are sorted in descending order instead of a lexicographical order used in previous studies. We then propose a new CSE algorithm which is the first proposed CSE algorithm using the new encoding order. Experimental results show that compression ratios of all files of the Calgary corpus in the proposed algorithms are better than those of a previous study on CSE with a finite alphabet. Moreover, compression ratios under the second proposed CSE get better than or equal to that under a well-known compressor for 11 files amongst 14 files in the corpus.", "subitem_description_type": "Abstract"}]}, "item_10001_publisher_8": {"attribute_name": "出版者", "attribute_value_mlt": [{"subitem_publisher": "The Institute of Electronics, Information and Communication Engineers"}]}, "item_10001_relation_14": {"attribute_name": "DOI", "attribute_value_mlt": [{"subitem_relation_type": "isIdenticalTo", "subitem_relation_type_id": {"subitem_relation_type_id_text": "10.1587/transfun.2019EAP1063", "subitem_relation_type_select": "DOI"}}]}, "item_10001_relation_17": {"attribute_name": "関連サイト", "attribute_value_mlt": [{"subitem_relation_type_id": {"subitem_relation_type_id_text": "https://search.ieice.org/index.html", "subitem_relation_type_select": "URI"}}]}, "item_10001_rights_15": {"attribute_name": "権利", "attribute_value_mlt": [{"subitem_rights": "Copyright © 2020 IEICE"}]}, "item_10001_source_id_9": {"attribute_name": "ISSN", "attribute_value_mlt": [{"subitem_source_identifier": "09168508", "subitem_source_identifier_type": "ISSN"}]}, "item_10001_version_type_20": {"attribute_name": "著者版フラグ", "attribute_value_mlt": [{"subitem_version_resource": "http://purl.org/coar/version/c_970fb48d4fbd8a85", "subitem_version_type": "VoR"}]}, "item_creator": {"attribute_name": "著者", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "OTA, Takahiro", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "26102", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "MORITA, Hiroyoshi", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "26103", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "MANADA, Akiko", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "26104", "nameIdentifierScheme": "WEKO"}]}]}, "item_files": {"attribute_name": "ファイル情報", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2020-11-19"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "e103-a_6_829.pdf", "filesize": [{"value": "806.8 kB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 806800.0, "url": {"label": "e103-a_6_829", "url": "https://uec.repo.nii.ac.jp/record/9679/files/e103-a_6_829.pdf"}, "version_id": "3515634e-4ff6-4d3a-bcd6-15d099014540"}]}, "item_keyword": {"attribute_name": "キーワード", "attribute_value_mlt": [{"subitem_subject": "CSE", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "sorting", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "contingency table", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "lossless data compression", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}]}, "item_language": {"attribute_name": "言語", "attribute_value_mlt": [{"subitem_language": "eng"}]}, "item_resource_type": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"resourcetype": "journal article", "resourceuri": "http://purl.org/coar/resource_type/c_6501"}]}, "item_title": "Compression by Substring Enumeration Using Sorted Contingency Tables", "item_titles": {"attribute_name": "タイトル", "attribute_value_mlt": [{"subitem_title": "Compression by Substring Enumeration Using Sorted Contingency Tables", "subitem_title_language": "en"}]}, "item_type_id": "10001", "owner": "13", "path": ["6"], "permalink_uri": "https://uec.repo.nii.ac.jp/records/9679", "pubdate": {"attribute_name": "公開日", "attribute_value": "2020-11-19"}, "publish_date": "2020-11-19", "publish_status": "0", "recid": "9679", "relation": {}, "relation_version_is_last": true, "title": ["Compression by Substring Enumeration Using Sorted Contingency Tables"], "weko_shared_id": -1}

Compression by Substring Enumeration Using Sorted Contingency Tables

https://uec.repo.nii.ac.jp/records/9679

名前 / ファイル	ライセンス	アクション
e103-a_6_829 (806.8 kB)

Item type

学術雑誌論文 / Journal Article(1)

公開日

2020-11-19

タイトル

言語

タイトル

Compression by Substring Enumeration Using Sorted Contingency Tables

言語

eng

キーワード

言語

主題

CSE

キーワード

言語

主題

sorting

キーワード

言語

主題

contingency table

キーワード

言語

主題

lossless data compression

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者

OTA, Takahiro
MORITA, Hiroyoshi
MANADA, Akiko

抄録

内容記述タイプ

Abstract

内容記述

This paper proposes two variants of improved Compression by Substring Enumeration (CSE) with a finite alphabet. In previous studies on CSE, an encoder utilizes inequalities which evaluate the number of occurrences of a substring or a minimal forbidden word (MFW) to be encoded. The inequalities are derived from a contingency table including the number of occurrences of a substring or an MFW. Moreover, codeword length of a substring and an MFW grows with the difference between the upper and lower bounds deduced from the inequalities, however the lower bound is not tight. Therefore, we derive a new tight lower bound based on the contingency table and consequently propose a new CSE algorithm using the new inequality. We also propose a new encoding order of substrings and MFWs based on a sorted contingency table such that both its row and column marginal total are sorted in descending order instead of a lexicographical order used in previous studies. We then propose a new CSE algorithm which is the first proposed CSE algorithm using the new encoding order. Experimental results show that compression ratios of all files of the Calgary corpus in the proposed algorithms are better than those of a previous study on CSE with a finite alphabet. Moreover, compression ratios under the second proposed CSE get better than or equal to that under a well-known compressor for 11 files amongst 14 files in the corpus.

書誌情報

en : IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

巻 E103.A, 号 6, p. 829-835, 発行日 2020-06-01

出版者

The Institute of Electronics, Information and Communication Engineers

ISSN

収録物識別子タイプ

ISSN

収録物識別子

09168508

DOI

Versions

Ver.1

2023-05-15 09:32:54.438382

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Compression by Substring Enumeration Using Sorted Contingency Tables

× OTA, Takahiro

× MORITA, Hiroyoshi

× MANADA, Akiko

Versions

Share

Cite as

エクスポート