Skip to content

Fix: pandas NaT in datetime/timedelta columns stored as garbage values#33

Open
mdbenito wants to merge 4 commits into
LadybugDB:mainfrom
mdbenito:fix/not-a-time
Open

Fix: pandas NaT in datetime/timedelta columns stored as garbage values#33
mdbenito wants to merge 4 commits into
LadybugDB:mainfrom
mdbenito:fix/not-a-time

Conversation

@mdbenito

Copy link
Copy Markdown
Contributor

Currently NaT/None in datetime64 and timedelta64 DataFrame columns are stored as the sentinel value (INT64_MIN ~= 1677-09-21) instead of NULL.

This PR addresses that.

To reproduce before the PR:

import datetime
import numpy as np
import pandas as pd
import ladybug

db = ladybug.Database()
conn = ladybug.Connection(db)

# datetime NaT
conn.execute("CREATE NODE TABLE t_dt (id INT64, ts TIMESTAMP, PRIMARY KEY (id))")

nat = np.datetime64("NaT", "ns")
df_dt = pd.DataFrame(
    {
        "id": [1, 2],
        "ts": np.array([np.datetime64("2024-01-15"), nat], dtype="datetime64[ns]"),
    }
)
conn.execute(
    "COPY t_dt FROM (LOAD FROM $df RETURN CAST(id AS INT64) AS id, CAST(ts AS TIMESTAMP) AS ts)",
    {"df": df_dt},
)

for row in (
    conn.execute("MATCH (t:t_dt) RETURN t.id, t.ts ORDER BY t.id")
    .get_as_df()
    .itertuples(index=False)
):
    print(f"id={row[0]:<2}  ts={row[1]!r}")

# timedelta NaT
conn.execute("CREATE NODE TABLE t_td (id INT64, dur INTERVAL, PRIMARY KEY (id))")

nat_td = np.timedelta64("NaT", "ns")
df_td = pd.DataFrame(
    {
        "id": [1, 2],
        "td": np.array(
            [np.timedelta64(3600000000000, "ns"), nat_td], dtype="timedelta64[ns]"
        ),
    }
)
conn.execute(
    "COPY t_td FROM (LOAD FROM $df RETURN CAST(id AS INT64) AS id, CAST(td AS INTERVAL) AS dur)",
    {"df": df_td},
)

for row in (
    conn.execute("MATCH (t:t_td) RETURN t.id, t.dur ORDER BY t.id")
    .get_as_df()
    .itertuples(index=False)
):
    print(f"id={row[0]:<2}  dur={row[1]!r}")

Running that script yields:

id=1   ts=Timestamp('2024-01-15 00:00:00')
id=2   ts=Timestamp('1677-09-21 00:12:43.145225')
id=1   dur=Timedelta('0 days 01:00:00')
id=2   dur=Timedelta('-106752 days +00:12:43.145225')

And after the fix:

id=1   ts=Timestamp('2024-01-15 00:00:00')
id=2   ts=NaT
id=1   dur=Timedelta('0 days 01:00:00')
id=2   dur=NaT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant