Skip to content

Error serializing dataframes with timestamps #599

@ektar

Description

@ektar

Discovered "error serializing datetime" (or timestamps or others), described below, when serializing a complex class with several datetime and other parameters, and several dataframes. Identified that the issue seems to be in how pandas df's are (de)serialized - the "to_dict" function is called (see here) instead of panda's own json functions... in a df with timestamps those are simply dumped out in the resulting dict and that is attempted to be serialized, which fails as the timestamps aren't serializable without handling.

I found that by monkey patching the functions to use pandas own to_json/read_json functions I'm able to get the desired functionality. Also, by using the "table" orient option we get the schema in the output, allowing better serde roundtrip.

If the code below looks OK I'll be happy to submit a PR to add

Versions:
pandas: 1.4.0
json: 2.0.9
param: 1.12.0
python 3.10
osx

Description of expected behavior and the observed behavior:

Serialize and deserialize pandas dataframes with timestamps

Complete, minimal, self-contained example code that reproduces the issue

import datetime as dt
import json
import pandas as pd
import param

class TestClass(param.Parameterized):
    df_param = param.DataFrame()

df = pd.DataFrame({'a': [dt.datetime(2000, 1, 1), dt.datetime(2000, 1, 1), dt.datetime(2000, 1, 1)],
                   'b': [1, 2, 3],
                   'c': [1.0, 2.0, 3.0]})

test = TestClass(df_param=df)

test_serde_dict = TestClass.param.deserialize_parameters(test.param.serialize_parameters())

Stack traceback and/or browser JavaScript console output

Traceback (most recent call last):
  File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3251, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-212f05d3f170>", line 15, in <module>
    test_serde_dict = TestClass.param.deserialize_parameters(test.param.serialize_parameters())
  File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/site-packages/param/parameterized.py", line 2087, in serialize_parameters
    return serializer.serialize_parameters(self_or_cls, subset=subset)
  File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/site-packages/param/serializer.py", line 104, in serialize_parameters
    return cls.dumps(components)
  File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/site-packages/param/serializer.py", line 81, in dumps
    return json.dumps(obj)
  File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Timestamp is not JSON serializable

Example fix:

def df_serialize(cls, value):
    return json.loads(value.to_json(orient='table'))

def df_deserialize(cls, value):
    import pandas as pd
    return pd.read_json(json.dumps(value), orient='table')

param.DataFrame.serialize = df_serialize
param.DataFrame.deserialize = df_deserialize

test_serde_dict = TestClass.param.deserialize_parameters(test.param.serialize_parameters())
test_serde = TestClass(**test_serde_dict)

print(test_serde.__repr__())
print(test.__repr__())

output:

TestClass(df_param=           a  b    c
0 2000-01-01  1  1.0
1 2000-01-01  2  2.0
2 2000-01-01  3  3.0, name='TestClass00002')
TestClass(df_param=           a  b    c
0 2000-01-01  1  1.0
1 2000-01-01  2  2.0
2 2000-01-01  3  3.0, name='TestClass00002')

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions