Skip to content

Conversation

@berenddeboer
Copy link

Problem

Target postgres does not like the value "1e-38" I'm seeing. For example I get this output for the type of a column:

{
  "multipleOf": 1e-38,
  "type": [
    "null",
    "number"
  ]
}

This crashes postgres with:

...
  File "/meltano/.meltano/loaders/target-postgres/venv/lib/python3.9/site-packages/jsonschema/_validators.py", line 300, in properties_draft4
    for error in validator.descend(
  File "/meltano/.meltano/loaders/target-postgres/venv/lib/python3.9/site-packages/jsonschema/validators.py", line 121, in descend
    for error in self.iter_errors(instance, schema):
  File "/meltano/.meltano/loaders/target-postgres/venv/lib/python3.9/site-packages/jsonschema/validators.py", line 105, in iter_errors
    for error in errors:
  File "/meltano/.meltano/loaders/target-postgres/venv/lib/python3.9/site-packages/jsonschema/_validators.py", line 127, in multipleOf
    failed = instance % dB
decimal.InvalidOperation: [<class 'decimal.DivisionImpossible'>]

Proposed changes

Don't understand it's purpose, and I got values like 1e-38 which the postgres target json schema validator didn't like. So simply drop this code. Have left it commented out. Left the other instances in the code, but they could possibly go to.

Types of changes

What types of changes does your code introduce to PipelineWise?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Checklist

  • Description above provides context of the change
  • I have added tests that prove my fix is effective or that my feature works
  • Unit tests for changes (not needed for documentation changes)
  • CI checks pass with my changes
  • Bumping version in setup.py is an individual PR and not mixed with feature or bugfix PRs
  • Commit message/PR title starts with [AP-NNNN] (if applicable. AP-NNNN = JIRA ID)
  • Branch name starts with AP-NNN (if applicable. AP-NNN = JIRA ID)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions

berenddeboer and others added 16 commits December 1, 2022 11:13
For views key_properties was null, and my postgres target crashed when
that happened.
fix: make sure key_properties is not null for views
…r exception

Log the original exception so we know the original cause, and don't
make it with another exception.
Seems unsupported since Oracle 12c.
Possibly something going wrong, but I do not understand the purpose of
the test. No longer working tests are commented out.
Unclear why this test fails. Possibly because of log miner? Skip for now.
Certain targets such as postgres type check input against the schema.
…e-as-number

fix: emit number as number instead of string
Don't understand it's purpose, and I got values like 1e-38 which the
postgres target json schema validator didn't like.
@mjsqu
Copy link
Owner

mjsqu commented Dec 7, 2022

This has been included to conform with the JSONSchema keyword multipleOf as referenced here:

https://cswr.github.io/JsonSchema/spec/basic_types/

I think target-snowflake is less opinionated about the values of multipleOf as it doesn't help Snowflake to assign data types to new columns.

@mjsqu
Copy link
Owner

mjsqu commented Dec 7, 2022

The error you're encountering is mentioned specifically here:

datamill-co/target-postgres#207

However if you are using the transferwise variant, then this line would stop the target if multipleOf were too long:

https://github.com/transferwise/pipelinewise-target-postgres/blob/e060d44c376095a9c8ae1a731f548ab9aa16c2f3/target_postgres/__init__.py#L126

@mjsqu
Copy link
Owner

mjsqu commented Dec 8, 2022

Reproducing the error, basic test against JSON schema:

from jsonschema import Draft4Validator, FormatChecker
import jsonschema
import json
import decimal

messages = [
{
  "type": "SCHEMA",
  "stream": "foobar",
  "schema": {
    "type": "object",
    "properties": {
      "sample_numeric": {
        "type": ["null", "number"],
        "exclusiveMaximum": True,
        "maximum": 100000000000000000000000000000000000000000000000000000000000000,
        "multipleOf": decimal.Decimal(10**-38),
        "exclusiveMinimum": True,
        "minimum": -100000000000000000000000000000000000000000000000000000000000000
      }
    }
  }
}
,
{
  "type": "RECORD",
  "stream": "foobar",
  "record": {
    "sample_numeric": decimal.Decimal(0.000913808181253534)
  }
}
]


validator = Draft4Validator(messages[0]['schema'],format_checker=FormatChecker())

validator.validate(messages[1]['record'])

The error occurs if the SCHEMA multipleOf value and the RECORD value are both of type Decimal.decimal()

I have no access to test this against Oracle, but I suspect a NUMBER from Oracle is being handled by this OutputTypeHandler (tap_oracle/sync_strategies/common.py) and values end up in the RECORD message as Decimal.decimal() as a result.

def OutputTypeHandler(cursor, name, defaultType, size, precision, scale):
   if defaultType == cx_Oracle.NUMBER:
      return cursor.var(decimal.Decimal, arraysize = cursor.arraysize)
   if defaultType == cx_Oracle.CLOB:
      return cursor.var(cx_Oracle.LONG_STRING, arraysize=cursor.arraysize)
   if defaultType == cx_Oracle.NCLOB:
      return cursor.var(cx_Oracle.LONG_STRING, arraysize=cursor.arraysize)
   if defaultType == cx_Oracle.BLOB:
      return cursor.var(cx_Oracle.LONG_BINARY, arraysize=cursor.arraysize)

I don't have access to an Oracle source until Monday, but it would be good to see the contents of the SCHEMA and RECORD messages that cause the DivisionImpossible error.

@mjsqu
Copy link
Owner

mjsqu commented Mar 22, 2023

@berenddeboer - I have just updated my version of singer-python (that this repo depends on) to output decimal.Decimal values as numbers as opposed to strings. This may fix your issue if you're still concerned about it. Let me know if you have a chance to retest it and update me with the results. Cheers!

@mjsqu mjsqu force-pushed the master branch 2 times, most recently from 2681957 to 3c27f01 Compare June 6, 2023 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants