New TimeseriesGenerator #7

TanguyUrvoy · 2018-06-04T14:59:30Z

changes relative to previous version of TimeseriesGenerator()

kept length (with a depreciation warning)
added hlength as a replacement
same parameters order

Other changes relative to keras version

consolidated the init() method by adding a host of sanity checks
now it also works with text
added prediction gap parameter
shuffle is now a real shuffle (i.e. without replacement)
added target_seq option for seq2seq models
added dtype parameter to force output dtype
added stateful option to help parameters tuning in stateful mode (experimental)

Dref360 · 2018-07-27T14:14:14Z

keras_preprocessing/sequence.py

+            raise ValueError('`targets` has to be at least as long as `data`.')
+
+        if hlength is None:
+            if length % sampling_rate != 0:


Please add a DeprecationWarning

Dref360 · 2018-07-27T14:14:58Z

keras_preprocessing/sequence.py

+        self.data = np.asarray(data)
+        self.targets = np.asarray(targets)
+
+        # FIXME: targets must be 2D for sequences output


Is this a new limitation or was it always like that?

Dref360 · 2018-07-27T14:16:36Z

tests/sequence_test.py

+    with assert_raises(ValueError) as context:
+        TimeseriesGenerator(data, data, length=50, sampling_rate=0)
+    error = str(context.exception)
+    print(error)


avoid the printing please

Dref360 · 2018-07-27T14:17:06Z

@srjoglekar246, Could I get your input on this PR?

srjoglekar246 · 2018-07-27T15:10:27Z

keras_preprocessing/sequence.py

-            The data should be at 2D, and axis 0 is expected
-            to be the time dimension.
+            The data should be convertible into a 1D numpy array,
+            if 2D or more, axis 0 is expected to be the time dimension.


nit: Might make this a little easier to understand if we say "and* axis 0"

srjoglekar246 · 2018-07-27T15:17:47Z

keras_preprocessing/sequence.py

+    assert data_gen[-1][1].tostring() == u"."
+
+    t = np.linspace(0,20*np.pi, num=1000) # time
+    x = np.sin(np.cos(3*t)) # input signa


srjoglekar246 · 2018-07-27T15:18:04Z

keras_preprocessing/sequence.py

+    assert data_gen[-1][0].tostring() == u" is simple"
+    assert data_gen[-1][1].tostring() == u"."
+
+    t = np.linspace(0,20*np.pi, num=1000) # time


space after ","

srjoglekar246 · 2018-07-27T15:18:59Z

keras_preprocessing/sequence.py

+                 stateful=False):
+
+        # Sanity check
+


Remove stray newline

srjoglekar246 · 2018-07-27T15:22:25Z

tests/sequence_test.py

+
+
+def test_TimeseriesGenerator_exceptions():
+


Please remove newlines at the beginning of functions :-)

srjoglekar246 · 2018-07-27T15:22:59Z

tests/sequence_test.py

+    expected_len = ceil((len(x) - g.hlength + 1.0) / g.batch_size)
+    print('gap: %i, hlength: %i, expected-len:%i, len: %i' %
+          (g.gap, g.hlength, expected_len, g.len))
+    # for i in range(len(g)):


Remove commented code

wptmdoorn

Is there a specific reason why this is not applied yet?
I think TimeseriesGenerator is a perfect module to use, but downside it misses some basic features such as a prediction gap (which was implemented here) so I would be very happy to see this commit actually being integrated into the library.

rjmccabe3701 · 2019-10-11T01:05:27Z

Looks like this MR is more far reaching then this. In my MR i simply added a inter_batch_stride option.

I'm hoping that these changes expose the same behavior I'm after (so the first sample of a batch isn't forced to follow the last sample of the previous batch)?

OverLordGoldDragon · 2019-10-11T02:27:13Z

This class looks like a lost cause in need of a total revision. Correct me if I'm wrong, but these lines will mix the batch dimension with a features dimension (timesteps), which is a big no-no and will obliterate training:

import numpy as np

data = np.random.randn(240, 4)  # (timesteps, channels)
length = 6
batch_size = 8
stride = 1
start_index, end_index, index = 6, 40, 0

i = start_index + batch_size * stride * index
rows = np.arange(i, min(i + batch_size * stride, end_index + 1), stride)

samples = np.array([data[row - length:row:sampling_rate] for row in rows])
print(samples.shape)
# (8, 6, 4)

The only workaround is feeding data compatible with such a manipulation to begin with - as no measurements are ever taken in such a format, it'll require an uneasy preprocessing, maybe on-the-fly for various length settings, which may substantially slow down training. There are other pitfalls I'll spare listing - but as I'm not too familiar with TimeseriesGenerator, maybe I'm missing something - feel free to point out.

fchollet · 2022-02-09T19:18:33Z

Closing outdated PR. Note that the Keras Preprocessing repository is no longer in use, instead please just use the dataset utilities provided in tf.keras.utils (see docs).

TanguyUrvoy added 6 commits June 4, 2018 16:14

Added new version of timeseriesGenerator() + tests

8514b36

pep8 compliance

9a0a403

Merge branch 'master' of https://github.com/keras-team/keras-preproce…

2308f06

…ssing

pep8 compliance 2

aa573f9

another pep8 correction

5e4b35e

aggresive pep8

04b019d

Dref360 reviewed Jul 27, 2018

View reviewed changes

srjoglekar246 reviewed Jul 27, 2018

View reviewed changes

rragundez added the sequence Related to sequences label Jan 13, 2019

wptmdoorn reviewed Jun 28, 2019

View reviewed changes

fchollet closed this Feb 9, 2022



		def test_TimeseriesGenerator_exceptions():

New TimeseriesGenerator #7

New TimeseriesGenerator #7

Uh oh!

Conversation

TanguyUrvoy commented Jun 4, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dref360 commented Jul 27, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wptmdoorn left a comment

Choose a reason for hiding this comment

Uh oh!

rjmccabe3701 commented Oct 11, 2019

Uh oh!

OverLordGoldDragon commented Oct 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fchollet commented Feb 9, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

OverLordGoldDragon commented Oct 11, 2019 •

edited

Loading