-
Notifications
You must be signed in to change notification settings - Fork 74.8k
Description
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
No
Source
source
TensorFlow version
tf 2.14.0
Custom code
Yes
OS platform and distribution
Linux Ubuntu 22.04
Mobile device
No response
Python version
3.11
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
CUDA V11.8.89, cuDNN version 8600
GPU model and memory
NVIDIA GeForce GTX 1080 Ti
Current behavior?
I noticed a linear increase of CPU memory usage in my setups when using a convolution on raw waveforms (i.e., sequences which are long in time and 1D in feature). I could isolate the issue and it seems to be related to the number of different sequence lengths that occur. I.e., if the sequence length is fixed to 100k, the memory consumption is constant. If it is randomly sampled from a given range, the memory consumption asymptotically grows towards a larger value as the range gets larger. This can be observed in the plot below. Also note that the memory consumption is not influenced by the absolute sequence length, just by the size of the range.
I measured the memory consumption using watch_memory()
from here. The different runs in the plot correspond to different n_time_min
and n_time_max
in the stand-alone code.
I reproduced the issue with an apptainer image built on top of the tensorflow 2.14 image from dockerhub. The image definition file looks as follows:
Bootstrap: docker
From: tensorflow/tensorflow:2.14.0-gpu
Stage: build
%post
apt update -y
# all the fundamental basics, zsh is need because calling the cache manager might launch the user shell
DEBIAN_FRONTEND=noninteractive apt install -y wget git unzip gzip libssl-dev lsb-release zsh \
bison libxml2-dev libopenblas-dev libsndfile1-dev libcrypto++-dev libcppunit-dev \
parallel xmlstarlet python3-lxml htop strace gdb sox python3-pip cmake ffmpeg vim
cd /usr/local
git clone https://github.com/rwth-i6/cache-manager.git
cd bin
ln -s ../cache-manager/cf cf
echo /usr/local/lib/python3.11/dist-packages/tensorflow > /etc/ld.so.conf.d/tensorflow.conf
ldconfig
apt install -y python3 python3-pip
pip3 install -U pip setuptools wheel
pip3 install ipdb
pip3 install h5py six soundfile librosa==0.10 better-exchook dm-tree psutil
pip3 install --ignore-installed psutil flask ipython
pip3 install git+https://github.com/rwth-i6/sisyphus
pip3 install black==22.3.0 matplotlib typing-extensions typeguard # sequitur-g2p==1.0.1668.23
pip3 install memray objgraph Pympler
Standalone code to reproduce the issue
import numpy as np
import tensorflow as tf
n_feat = 1
n_out = 30
filter_size = 5
n_steps = 100000
n_time_min = 10000
n_time_max = 30000
batch_size_max = 400000
filters = tf.Variable(tf.random.normal((filter_size, n_feat, n_out), stddev=0.01))
for step in range(n_steps):
n_time = np.random.randint(n_time_min, n_time_max)
n_batch = batch_size_max // n_time
x = tf.random.normal((n_batch, n_time, n_feat))
y = tf.nn.convolution(
x,
filters=filters,
padding="VALID",
)
Relevant log output
No response