Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions NaN-ja.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## NaN 値の扱いについて

問題:  _extract_dtype() 関数では、AccessDBの項目属性の `long` = 32bit長整数型を、
numpyでの整数型(マシンにより32/64bit長の) `int_` すなわち Python組み込み型の `int` を
pandasが実装している配列での項目型属性パラメータ dtypes に 直接指定しています。
しかし実際のAccessDBでは、長整数型の項目でも NULL値が許されていて、しばしばエラーになります。

## 回避策: 'Int64' を使う。

この問題を解決するためには、`np.int_` の代わりに `np.float_` 型を使うのは一つの方法です。
しかし、整数型として扱いたい場合は不便です。 代わりに、NaN 値を扱えるようにするために、
Pandas の `Int64` 型を使うことができます。`Int64` 型は、欠損値(NaN)をサポートする整数型です。

14 changes: 14 additions & 0 deletions NaN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
## Handling NaN value data types
Problem:

The `_extract_dtype()function directly specifies Microsoft Access DB's item attribute long (..32-bit long integer type) as int_,
which corresponds to NumPy's integer type (whose bit-length varies by machine, 32/64 bits)
and the array item type parameter dtypes implemented in pandas using Python's built-in type int.
However, in actual Microsoft Access DBs, items of the long integer type can accept NULL values, often resulting in errors.

## Solution : Use 'Int64'

To solve this problem, one approach is to use the np.float_ type instead of np.int_. However, if you
want to maintain integer handling, this can be inconvenient. Alternatively, you could use Pandas's
Int64 type to accommodate NaN values. The "Int64" type supports missing values (NaN) for integers.

4 changes: 2 additions & 2 deletions pandas_access/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ def _extract_dtype(data_type):
# open an issue.
data_type = data_type.lower()
if data_type.startswith('double'):
return np.float_
return np.float64
elif data_type.startswith('long'):
return np.int_
return 'Int64'
else:
return None

Expand Down
17 changes: 17 additions & 0 deletions tests/test__extract_dtype.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import unittest
import patch
import pandas as pd
import numpy as np
# target
from pandas_access import _extract_dtype

class TestPandasAccess(unittest.TestCase):
def test_extract_dtype_double(self):
self.assertEqual(_extract_dtype('double'),np.float64)
def test_extract_dtype_long(self):
self.assertEqual(_extract_dtype('long'),'Int64')
def test_extract_dtype_unknown(self):
self.assertIsNone(_extract_dtype('unknown'))

if __name__=='__main__':
unittest.main()