ログイン
言語:

WEKO3

  • トップ
  • ランキング
To
lat lon distance
To

Field does not validate



インデックスリンク

インデックスツリー

メールアドレスを入力してください。

WEKO

One fine body…

WEKO

One fine body…

アイテム

{"_buckets": {"deposit": "de60f270-5f2d-41c3-a6b4-3f6d8b3ff403"}, "_deposit": {"created_by": 3, "id": "288", "owners": [3], "pid": {"revision_id": 0, "type": "depid", "value": "288"}, "status": "published"}, "_oai": {"id": "oai:uec.repo.nii.ac.jp:00000288", "sets": ["6"]}, "author_link": ["6416", "6414", "6415", "6413"], "item_10001_biblio_info_7": {"attribute_name": "書誌情報", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "2011-12-01", "bibliographicIssueDateType": "Issued"}, "bibliographicIssueNumber": "12", "bibliographicPageEnd": "2327", "bibliographicPageStart": "2319", "bibliographicVolumeNumber": "E94-D", "bibliographic_titles": [{"bibliographic_title": "IEICE Transactions on Information and Systems"}]}]}, "item_10001_description_4": {"attribute_name": "著者ID", "attribute_value_mlt": [{"subitem_description": "1000050422407", "subitem_description_type": "Other"}, {"subitem_description": "1000060210738", "subitem_description_type": "Other"}]}, "item_10001_description_6": {"attribute_name": "内容記述", "attribute_value_mlt": [{"subitem_description": "In this paper, we propose an approach to obtaining en-hanced performance of the Linpack benchmark on a GPU-accelerated PCcluster connected via relatively slow inter-node connections. For one nodewith a quad-core Intel Xeon W3520 processor and a NVIDIA Tesla C1060GPU card, we implement a CPU-GPU parallel double-precision generalmatrix-matirx multiplication (dgemm) operation, and achieve a perfor-mance improvement of 34% compared with the GPU-only case and 64%compared with the CPU-only case. For an entire 16-node cluster, each nodeof which is the same as the above and is connected with two gigabit Ether-net links, we use a computation-communication overlap scheme with GPUacceleration for the Linpack benchmark, and achieve a performance im-provement of 28% compared with the GPU-accelerated high-performanceLinpack benchmark (HPL) without overlapping. Our overlap GPU accel-eration solution uses overlaps in which the main inter-node communicationand data transfer to the GPU device memory are overlapped with the maincomputation task on the CPU cores. These overlaps use multi-core pro-cessors, which almost all of today’s high-performance computers use. Inparticular, as well as using a CPU core for communication tasks, we alsosimultaneously use other CPU cores and the GPU for computation tasks.In order to enable overlap between inter-node communication and com-putation tasks, we eliminate their close dependence by breaking the maincomputation task into smaller tasks and rescheduling. Based on a scheme inwhich part of the CPU computation power is simultaneously used for tasksother than computation tasks, we experimentally find the optimal compu-tation ratio for CPUs; this ratio differs from the case of parallel dgemmoperation of one node.", "subitem_description_type": "Other"}]}, "item_10001_publisher_8": {"attribute_name": "出版者", "attribute_value_mlt": [{"subitem_publisher": "The Institute of Electronics, Information and Comunication Engineers"}]}, "item_10001_relation_17": {"attribute_name": "関連サイト", "attribute_value_mlt": [{"subitem_relation_name": [{"subitem_relation_name_text": "http://www.ieice.org/jpn/index.html"}], "subitem_relation_type_id": {"subitem_relation_type_id_text": "http://www.ieice.org/jpn/index.html", "subitem_relation_type_select": "URI"}}]}, "item_10001_source_id_9": {"attribute_name": "ISSN", "attribute_value_mlt": [{"subitem_source_identifier": "09168532 ", "subitem_source_identifier_type": "ISSN"}]}, "item_10001_text_24": {"attribute_name": "自由記述ライセンス", "attribute_value_mlt": [{"subitem_text_value": "Copyright c 2011 The Institute of Electronics, Information and Communication Engineers"}]}, "item_10001_version_type_20": {"attribute_name": "著者版フラグ", "attribute_value_mlt": [{"subitem_version_resource": "http://purl.org/coar/version/c_970fb48d4fbd8a85", "subitem_version_type": "VoR"}]}, "item_creator": {"attribute_name": "著者", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "Junichi, Ohmura", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "6413", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Takefumi, Miyoshi", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "6414", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Hidetsugu, Irie", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "6415", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Tsutomu, Yoshinaga", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "6416", "nameIdentifierScheme": "WEKO"}]}]}, "item_files": {"attribute_name": "ファイル情報", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2016-09-15"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "9000000554.pdf", "filesize": [{"value": "652.4 kB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 652400.0, "url": {"label": "9000000554.pdf", "url": "https://uec.repo.nii.ac.jp/record/288/files/9000000554.pdf"}, "version_id": "ed97d760-5f32-408a-a7b1-02ffde2301c2"}]}, "item_keyword": {"attribute_name": "キーワード", "attribute_value_mlt": [{"subitem_subject": "parallel processing, multi-core processor, GPU, computation-communication overlap", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}]}, "item_language": {"attribute_name": "言語", "attribute_value_mlt": [{"subitem_language": "eng"}]}, "item_resource_type": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"resourcetype": "journal article", "resourceuri": "http://purl.org/coar/resource_type/c_6501"}]}, "item_title": "Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster", "item_titles": {"attribute_name": "タイトル", "attribute_value_mlt": [{"subitem_title": "Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster", "subitem_title_language": "en"}]}, "item_type_id": "10001", "owner": "3", "path": ["6"], "permalink_uri": "https://uec.repo.nii.ac.jp/records/288", "pubdate": {"attribute_name": "公開日", "attribute_value": "2011-12-01"}, "publish_date": "2011-12-01", "publish_status": "0", "recid": "288", "relation": {}, "relation_version_is_last": true, "title": ["Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster"], "weko_shared_id": 3}
  1. 学術論文等

Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster

https://uec.repo.nii.ac.jp/records/288
https://uec.repo.nii.ac.jp/records/288
7e4538b5-2840-4793-b005-0f40ed5374c6
名前 / ファイル ライセンス アクション
9000000554.pdf 9000000554.pdf (652.4 kB)
Item type 学術雑誌論文 / Journal Article(1)
公開日 2011-12-01
タイトル
言語 en
タイトル Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster
言語
言語 eng
キーワード
言語 en
主題Scheme Other
主題 parallel processing, multi-core processor, GPU, computation-communication overlap
資源タイプ
資源タイプ識別子 http://purl.org/coar/resource_type/c_6501
資源タイプ journal article
著者 Junichi, Ohmura

× Junichi, Ohmura

WEKO 6413

en Junichi, Ohmura

Search repository
Takefumi, Miyoshi

× Takefumi, Miyoshi

WEKO 6414

en Takefumi, Miyoshi

Search repository
Hidetsugu, Irie

× Hidetsugu, Irie

WEKO 6415

en Hidetsugu, Irie

Search repository
Tsutomu, Yoshinaga

× Tsutomu, Yoshinaga

WEKO 6416

en Tsutomu, Yoshinaga

Search repository
著者ID
内容記述タイプ Other
内容記述 1000050422407
著者ID
内容記述タイプ Other
内容記述 1000060210738
内容記述
内容記述タイプ Other
内容記述 In this paper, we propose an approach to obtaining en-hanced performance of the Linpack benchmark on a GPU-accelerated PCcluster connected via relatively slow inter-node connections. For one nodewith a quad-core Intel Xeon W3520 processor and a NVIDIA Tesla C1060GPU card, we implement a CPU-GPU parallel double-precision generalmatrix-matirx multiplication (dgemm) operation, and achieve a perfor-mance improvement of 34% compared with the GPU-only case and 64%compared with the CPU-only case. For an entire 16-node cluster, each nodeof which is the same as the above and is connected with two gigabit Ether-net links, we use a computation-communication overlap scheme with GPUacceleration for the Linpack benchmark, and achieve a performance im-provement of 28% compared with the GPU-accelerated high-performanceLinpack benchmark (HPL) without overlapping. Our overlap GPU accel-eration solution uses overlaps in which the main inter-node communicationand data transfer to the GPU device memory are overlapped with the maincomputation task on the CPU cores. These overlaps use multi-core pro-cessors, which almost all of today’s high-performance computers use. Inparticular, as well as using a CPU core for communication tasks, we alsosimultaneously use other CPU cores and the GPU for computation tasks.In order to enable overlap between inter-node communication and com-putation tasks, we eliminate their close dependence by breaking the maincomputation task into smaller tasks and rescheduling. Based on a scheme inwhich part of the CPU computation power is simultaneously used for tasksother than computation tasks, we experimentally find the optimal compu-tation ratio for CPUs; this ratio differs from the case of parallel dgemmoperation of one node.
書誌情報 IEICE Transactions on Information and Systems

巻 E94-D, 号 12, p. 2319-2327, 発行日 2011-12-01
出版者
出版者 The Institute of Electronics, Information and Comunication Engineers
ISSN
収録物識別子タイプ ISSN
収録物識別子 09168532
関連サイト
識別子タイプ URI
関連識別子 http://www.ieice.org/jpn/index.html
関連名称 http://www.ieice.org/jpn/index.html
著者版フラグ
出版タイプ VoR
出版タイプResource http://purl.org/coar/version/c_970fb48d4fbd8a85
自由記述ライセンス
Copyright c 2011 The Institute of Electronics, Information and Communication Engineers
戻る
0
views
See details
Views

Versions

Ver.1 2023-05-15 09:35:18.745715
Show All versions

Share

Mendeley Twitter Facebook Print Addthis

Cite as

エクスポート

OAI-PMH
  • OAI-PMH JPCOAR
  • OAI-PMH DublinCore
  • OAI-PMH DDI
Other Formats
  • JSON
  • BIBTEX

Confirm


Powered by WEKO3


Powered by WEKO3