-
Notifications
You must be signed in to change notification settings - Fork 437
New TimeseriesGenerator #7
Conversation
| raise ValueError('`targets` has to be at least as long as `data`.') | ||
|
|
||
| if hlength is None: | ||
| if length % sampling_rate != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a DeprecationWarning
| self.data = np.asarray(data) | ||
| self.targets = np.asarray(targets) | ||
|
|
||
| # FIXME: targets must be 2D for sequences output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a new limitation or was it always like that?
| with assert_raises(ValueError) as context: | ||
| TimeseriesGenerator(data, data, length=50, sampling_rate=0) | ||
| error = str(context.exception) | ||
| print(error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid the printing please
|
@srjoglekar246, Could I get your input on this PR? |
| The data should be at 2D, and axis 0 is expected | ||
| to be the time dimension. | ||
| The data should be convertible into a 1D numpy array, | ||
| if 2D or more, axis 0 is expected to be the time dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Might make this a little easier to understand if we say "and* axis 0"
| assert data_gen[-1][1].tostring() == u"." | ||
| t = np.linspace(0,20*np.pi, num=1000) # time | ||
| x = np.sin(np.cos(3*t)) # input signa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signal*
| assert data_gen[-1][0].tostring() == u" is simple" | ||
| assert data_gen[-1][1].tostring() == u"." | ||
| t = np.linspace(0,20*np.pi, num=1000) # time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after ","
| stateful=False): | ||
|
|
||
| # Sanity check | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove stray newline
|
|
||
|
|
||
| def test_TimeseriesGenerator_exceptions(): | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove newlines at the beginning of functions :-)
| expected_len = ceil((len(x) - g.hlength + 1.0) / g.batch_size) | ||
| print('gap: %i, hlength: %i, expected-len:%i, len: %i' % | ||
| (g.gap, g.hlength, expected_len, g.len)) | ||
| # for i in range(len(g)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented code
wptmdoorn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason why this is not applied yet?
I think TimeseriesGenerator is a perfect module to use, but downside it misses some basic features such as a prediction gap (which was implemented here) so I would be very happy to see this commit actually being integrated into the library.
|
Looks like this MR is more far reaching then this. In my MR i simply added a I'm hoping that these changes expose the same behavior I'm after (so the first sample of a batch isn't forced to follow the last sample of the previous batch)? |
|
This class looks like a lost cause in need of a total revision. Correct me if I'm wrong, but these lines will mix the batch dimension with a features dimension (timesteps), which is a big no-no and will obliterate training: import numpy as np
data = np.random.randn(240, 4) # (timesteps, channels)
length = 6
batch_size = 8
stride = 1
start_index, end_index, index = 6, 40, 0i = start_index + batch_size * stride * index
rows = np.arange(i, min(i + batch_size * stride, end_index + 1), stride)
samples = np.array([data[row - length:row:sampling_rate] for row in rows])
print(samples.shape)
# (8, 6, 4)The only workaround is feeding data compatible with such a manipulation to begin with - as no measurements are ever taken in such a format, it'll require an uneasy preprocessing, maybe on-the-fly for various |
|
Closing outdated PR. Note that the Keras Preprocessing repository is no longer in use, instead please just use the dataset utilities provided in |
changes relative to previous version of
TimeseriesGenerator()length(with a depreciation warning)hlengthas a replacementOther changes relative to keras version
init()method by adding a host of sanity checksgapparametertarget_seqoption for seq2seq modelsdtypeparameter to force output dtypestatefuloption to help parameters tuning in stateful mode (experimental)