-
-
Notifications
You must be signed in to change notification settings - Fork 79
Description
Discovered "error serializing datetime" (or timestamps or others), described below, when serializing a complex class with several datetime and other parameters, and several dataframes. Identified that the issue seems to be in how pandas df's are (de)serialized - the "to_dict" function is called (see here) instead of panda's own json functions... in a df with timestamps those are simply dumped out in the resulting dict and that is attempted to be serialized, which fails as the timestamps aren't serializable without handling.
I found that by monkey patching the functions to use pandas own to_json/read_json functions I'm able to get the desired functionality. Also, by using the "table" orient option we get the schema in the output, allowing better serde roundtrip.
If the code below looks OK I'll be happy to submit a PR to add
Versions:
pandas: 1.4.0
json: 2.0.9
param: 1.12.0
python 3.10
osx
Description of expected behavior and the observed behavior:
Serialize and deserialize pandas dataframes with timestamps
Complete, minimal, self-contained example code that reproduces the issue
import datetime as dt
import json
import pandas as pd
import param
class TestClass(param.Parameterized):
df_param = param.DataFrame()
df = pd.DataFrame({'a': [dt.datetime(2000, 1, 1), dt.datetime(2000, 1, 1), dt.datetime(2000, 1, 1)],
'b': [1, 2, 3],
'c': [1.0, 2.0, 3.0]})
test = TestClass(df_param=df)
test_serde_dict = TestClass.param.deserialize_parameters(test.param.serialize_parameters())
Stack traceback and/or browser JavaScript console output
Traceback (most recent call last):
File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3251, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-212f05d3f170>", line 15, in <module>
test_serde_dict = TestClass.param.deserialize_parameters(test.param.serialize_parameters())
File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/site-packages/param/parameterized.py", line 2087, in serialize_parameters
return serializer.serialize_parameters(self_or_cls, subset=subset)
File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/site-packages/param/serializer.py", line 104, in serialize_parameters
return cls.dumps(components)
File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/site-packages/param/serializer.py", line 81, in dumps
return json.dumps(obj)
File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Timestamp is not JSON serializable
Example fix:
def df_serialize(cls, value):
return json.loads(value.to_json(orient='table'))
def df_deserialize(cls, value):
import pandas as pd
return pd.read_json(json.dumps(value), orient='table')
param.DataFrame.serialize = df_serialize
param.DataFrame.deserialize = df_deserialize
test_serde_dict = TestClass.param.deserialize_parameters(test.param.serialize_parameters())
test_serde = TestClass(**test_serde_dict)
print(test_serde.__repr__())
print(test.__repr__())
output:
TestClass(df_param= a b c
0 2000-01-01 1 1.0
1 2000-01-01 2 2.0
2 2000-01-01 3 3.0, name='TestClass00002')
TestClass(df_param= a b c
0 2000-01-01 1 1.0
1 2000-01-01 2 2.0
2 2000-01-01 3 3.0, name='TestClass00002')