Skip to content

Commit b155b15

Browse files
JieRen98XuehaiPan
andauthored
perf(acc_op): further acceleration with CUDA unroll (#112)
Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
1 parent 8ef1bea commit b155b15

File tree

3 files changed

+305
-169
lines changed

3 files changed

+305
-169
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1313

1414
### Added
1515

16+
- Add unroll pragma for CUDA OPs by [@JieRen98](https://github.com/JieRen98) and [@XuehaiPan](https://github.com/XuehaiPan) in [#112](https://github.com/metaopt/torchopt/pull/112).
1617
- Add Python implementation of accelerated OP and pure-Python wheels by [@XuehaiPan](https://github.com/XuehaiPan) in [#67](https://github.com/metaopt/torchopt/pull/67).
1718
- Add `nan_to_num` hook and gradient transformation by [@XuehaiPan](https://github.com/XuehaiPan) in [#119](https://github.com/metaopt/torchopt/pull/119).
1819
- Add matrix inversion linear solver with neumann series approximation by [@Benjamin-eecs](https://github.com/Benjamin-eecs) and [@XuehaiPan](https://github.com/XuehaiPan) in [#98](https://github.com/metaopt/torchopt/pull/98).

src/adam_op/adam_op_impl_cpu.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ void adamForwardNuCPUKernel(const scalar_t *__restrict__ updates_ptr,
135135
const scalar_t updates = updates_ptr[tid];
136136
const scalar_t nu = nu_ptr[tid];
137137

138-
const scalar_t nu_out = b2 * nu + (1 - b2) * pow(updates, 2);
138+
const scalar_t nu_out = b2 * nu + (1 - b2) * updates * updates;
139139
nu_out_ptr[tid] = nu_out;
140140
}
141141
}

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy