ログイン
言語:

WEKO3

  • トップ
  • ランキング
To
lat lon distance
To

Field does not validate



インデックスリンク

インデックスツリー

メールアドレスを入力してください。

WEKO

One fine body…

WEKO

One fine body…

アイテム

  1. 学術論文等

Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster

https://uec.repo.nii.ac.jp/records/288
https://uec.repo.nii.ac.jp/records/288
7e4538b5-2840-4793-b005-0f40ed5374c6
名前 / ファイル ライセンス アクション
9000000554.pdf 9000000554.pdf (652.4 kB)
Item type 学術雑誌論文 / Journal Article(1)
公開日 2011-12-01
タイトル
タイトル Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster
言語 en
言語
言語 eng
キーワード
言語 en
主題 parallel processing, multi-core processor, GPU, computation-communication overlap
資源タイプ
資源タイプ識別子 http://purl.org/coar/resource_type/c_6501
資源タイプ journal article
著者 Junichi, Ohmura

× Junichi, Ohmura

en Junichi, Ohmura

Search repository
Takefumi, Miyoshi

× Takefumi, Miyoshi

en Takefumi, Miyoshi

Search repository
Hidetsugu, Irie

× Hidetsugu, Irie

en Hidetsugu, Irie

Search repository
Tsutomu, Yoshinaga

× Tsutomu, Yoshinaga

en Tsutomu, Yoshinaga

Search repository
著者ID
内容記述タイプ Other
内容記述 1000050422407
著者ID
内容記述タイプ Other
内容記述 1000060210738
内容記述
内容記述タイプ Other
内容記述 In this paper, we propose an approach to obtaining en-hanced performance of the Linpack benchmark on a GPU-accelerated PCcluster connected via relatively slow inter-node connections. For one nodewith a quad-core Intel Xeon W3520 processor and a NVIDIA Tesla C1060GPU card, we implement a CPU-GPU parallel double-precision generalmatrix-matirx multiplication (dgemm) operation, and achieve a perfor-mance improvement of 34% compared with the GPU-only case and 64%compared with the CPU-only case. For an entire 16-node cluster, each nodeof which is the same as the above and is connected with two gigabit Ether-net links, we use a computation-communication overlap scheme with GPUacceleration for the Linpack benchmark, and achieve a performance im-provement of 28% compared with the GPU-accelerated high-performanceLinpack benchmark (HPL) without overlapping. Our overlap GPU accel-eration solution uses overlaps in which the main inter-node communicationand data transfer to the GPU device memory are overlapped with the maincomputation task on the CPU cores. These overlaps use multi-core pro-cessors, which almost all of today’s high-performance computers use. Inparticular, as well as using a CPU core for communication tasks, we alsosimultaneously use other CPU cores and the GPU for computation tasks.In order to enable overlap between inter-node communication and com-putation tasks, we eliminate their close dependence by breaking the maincomputation task into smaller tasks and rescheduling. Based on a scheme inwhich part of the CPU computation power is simultaneously used for tasksother than computation tasks, we experimentally find the optimal compu-tation ratio for CPUs; this ratio differs from the case of parallel dgemmoperation of one node.
書誌情報 IEICE Transactions on Information and Systems

巻 E94-D, 号 12, p. 2319-2327, 発行日 2011-12-01
出版者
出版者 The Institute of Electronics, Information and Comunication Engineers
ISSN
収録物識別子タイプ ISSN
収録物識別子 09168532
関連サイト
識別子タイプ URI
関連識別子 http://www.ieice.org/jpn/index.html
関連名称 http://www.ieice.org/jpn/index.html
著者版フラグ
出版タイプ VoR
出版タイプResource http://purl.org/coar/version/c_970fb48d4fbd8a85
自由記述ライセンス
Copyright c 2011 The Institute of Electronics, Information and Communication Engineers
戻る
0
views
See details
Views

Versions

Ver.1 2023-05-15 09:35:18.745715
Show All versions

Share

Mendeley Twitter Facebook Print Addthis

Cite as

エクスポート

OAI-PMH
  • OAI-PMH JPCOAR 2.0
  • OAI-PMH JPCOAR 1.0
  • OAI-PMH DublinCore
  • OAI-PMH DDI
Other Formats
  • JSON
  • BIBTEX

Confirm


Powered by WEKO3


Powered by WEKO3