Skip to content
This repository was archived by the owner on Oct 29, 2024. It is now read-only.
This repository was archived by the owner on Oct 29, 2024. It is now read-only.

Incorrect nanosecond timestamps being written to influxdb #649

@auphofBSF

Description

@auphofBSF

Writing point with time precision in ns results in incorrect nanosecond timestamps being written to influxdb
effect can be noticed in writing points from both the DataFrameClient.writepointsas well as the InfluxDBClient.write
This has been noted in issues: #527,#489#346 and #344 and #340 

and a proposed pull request submitted. #346 . I believe this pull request does not go far enough

#489 describes the same "floor divide" solution as part of the solution being proposed here.

This solution I am proposing in a pull request will use, floor divide and pandas.Timestamp that supports ns resolution.

Implication

In trying to use the library to update a record it writes a new record with the erroneous timestamp

Effected code regions

_dataframe_client
_convert_dataframe_to_json
_convert_dataframe_to_lines
_datetime_to_epoch
line_protocol.py
_convert_timestamp

Explanation:

the divide of an (np.int64 or int ) nanosecond timepoint by 1 (ns precision) is producing errors. it is necessary to use the floor divide operator // which yields the correct (np.int64 or int) value

If an np.int64 is divided (Operator /) by an int it yields a float64. When the initial np.int64 value is large enough the conversion back to an np.int64 looses nanosecond precision. The same error occurs with standard python int types.

Example (formulated in a ipython notebook)

.......... dataframe  is from a query (influxdb measurement having particular field value errors) returned 2 timepoints
dataframe.index  --->   DatetimeIndex(['2018-05-29 00:53:12.889962156+00:00', '2018-10-03 07:06:36.975643599+00:00'], dtype='datetime64[ns, UTC]', freq=None)
dataframe.index[1]  ---> Timestamp('2018-10-03 07:06:36.975643599+0000', tz='UTC')
dataframe.index[1].value  ---> 1538550396975643599

######    alternative method for timepoint fabrication   #########
timepoint = pd.Timestamp('2018-10-03 07:06:36.975643599+0000', tz='UTC')
timepoint.value   ---> 1538550396975643599

type(timepoint.value)   ---> int
timepoint.value / 1  ----> 1.5385503969756436e+18
type(timepoint.value / 1) ----> float
type(timepoint.value // 1) ----> int
np.int64(timepoint.value // 1)   ---> 1538550396975643599
np.int64(timepoint.value / 1)  ---> 1538550396975643648

Error(ns) is

np.int64(timepoint.value // 1) - np.int64(timepoint.value / 1)  ---> -49

######## Check using unit test timepoint ################

EPOCH = pd.Timestamp('1970-01-01 00:00+00:00')
nowplus1h = EPOCH + pd.Timedelta('1 hour')
nowplus1h.value ---> 3600000000000
nowplus1h.value / 1 ---> 3600000000000.0
np.int64(nowplus1h.value / 1) ---> 3600000000000

No Error in this

######### Suggested unit test timepoint ################

futuretimepoint = EPOCH + pd.Timedelta('20000 day  +23:10:55.123456789')
futuretimepoint.value ---> 1728083455123456789
futuretimepoint.value / 1 ---> 1.7280834551234568e+18
np.int64(futuretimepoint.value / 1) ---> 1728083455123456768

Error(ns) is

futuretimepoint.value - np.int64(futuretimepoint.value / 1) ---> 21

all locations where timepoints are calculated need modification to yield the expected result in nanosecond precision
the unittests do not show this up as the 2 test timepoints are small enough to not show the loss of precision
points are EPOCH '1970-01-01 00:00+00:00' EPOCH + 1 hour. These 2 test points have only microsecond resolution with no nano second component.
I propose the test case 2 to be something like EPOCH + 20000days and 23h 10m 55.123456789s. It is necessary to change all calculations based on datetime (only has microsecond resolution) to pandas Timedelta and Timestamp (these have nanosecond resolution)

I am preparing a pull request that attempts to address all timestamp calculations and fixes the unittests

Product version where issue discovered and where fixes are being tested

pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 70 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 36.4.0
Cython: None
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
............................................
...........
..........................

np.version.full_version

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy