-
Notifications
You must be signed in to change notification settings - Fork 524
Incorrect nanosecond timestamps being written to influxdb #649
Description
Writing point with time precision in ns results in incorrect nanosecond timestamps being written to influxdb
effect can be noticed in writing points from both the DataFrameClient.writepoints
as well as the InfluxDBClient.write
This has been noted in issues: #527,#489, #346 and #344 and #340
and a proposed pull request submitted. #346 . I believe this pull request does not go far enough
#489 describes the same "floor divide" solution as part of the solution being proposed here.
This solution I am proposing in a pull request will use, floor divide and pandas.Timestamp that supports ns resolution.
Implication
In trying to use the library to update a record it writes a new record with the erroneous timestamp
Effected code regions
_dataframe_client
_convert_dataframe_to_json
_convert_dataframe_to_lines
_datetime_to_epoch
line_protocol.py
_convert_timestamp
Explanation:
the divide of an (np.int64 or int ) nanosecond timepoint by 1 (ns precision) is producing errors. it is necessary to use the floor divide operator // which yields the correct (np.int64 or int) value
If an np.int64 is divided (Operator /) by an int it yields a float64. When the initial np.int64 value is large enough the conversion back to an np.int64 looses nanosecond precision. The same error occurs with standard python int types.
Example (formulated in a ipython notebook)
.......... dataframe is from a query (influxdb measurement having particular field value errors) returned 2 timepoints
dataframe.index ---> DatetimeIndex(['2018-05-29 00:53:12.889962156+00:00', '2018-10-03 07:06:36.975643599+00:00'], dtype='datetime64[ns, UTC]', freq=None)
dataframe.index[1] ---> Timestamp('2018-10-03 07:06:36.975643599+0000', tz='UTC')
dataframe.index[1].value ---> 1538550396975643599
###### alternative method for timepoint fabrication #########
timepoint = pd.Timestamp('2018-10-03 07:06:36.975643599+0000', tz='UTC')
timepoint.value ---> 1538550396975643599
type(timepoint.value) ---> int
timepoint.value / 1 ----> 1.5385503969756436e+18
type(timepoint.value / 1) ----> float
type(timepoint.value // 1) ----> int
np.int64(timepoint.value // 1) ---> 1538550396975643599
np.int64(timepoint.value / 1) ---> 1538550396975643648
Error(ns) is
np.int64(timepoint.value // 1) - np.int64(timepoint.value / 1) ---> -49
######## Check using unit test timepoint ################
EPOCH = pd.Timestamp('1970-01-01 00:00+00:00')
nowplus1h = EPOCH + pd.Timedelta('1 hour')
nowplus1h.value ---> 3600000000000
nowplus1h.value / 1 ---> 3600000000000.0
np.int64(nowplus1h.value / 1) ---> 3600000000000
No Error in this
######### Suggested unit test timepoint ################
futuretimepoint = EPOCH + pd.Timedelta('20000 day +23:10:55.123456789')
futuretimepoint.value ---> 1728083455123456789
futuretimepoint.value / 1 ---> 1.7280834551234568e+18
np.int64(futuretimepoint.value / 1) ---> 1728083455123456768
Error(ns) is
futuretimepoint.value - np.int64(futuretimepoint.value / 1) ---> 21
all locations where timepoints are calculated need modification to yield the expected result in nanosecond precision
the unittests do not show this up as the 2 test timepoints are small enough to not show the loss of precision
points are EPOCH '1970-01-01 00:00+00:00' EPOCH + 1 hour. These 2 test points have only microsecond resolution with no nano second component.
I propose the test case 2 to be something like EPOCH + 20000days and 23h 10m 55.123456789s. It is necessary to change all calculations based on datetime (only has microsecond resolution) to pandas Timedelta and Timestamp (these have nanosecond resolution)
I am preparing a pull request that attempts to address all timestamp calculations and fixes the unittests
Product version where issue discovered and where fixes are being tested
pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 70 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 36.4.0
Cython: None
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
............................................
...........
..........................
np.version.full_version