Releases: pymc-devs/pymc
Releases · pymc-devs/pymc
PyMC3 v3.8 (29 November, 2019)
New features
- Implemented robust u turn check in NUTS (similar to stan-dev/stan#2800). See PR [#3605]
- Add capabilities to do inference on parameters in a differential equation with
DifferentialEquation. See #3590 and #3634. - Distinguish between
DataandDeterministicvariables when graphing models with graphviz. PR #3491. - Sequential Monte Carlo - Approximate Bayesian Computation step method is now available. The implementation is in an experimental stage and will be further improved.
- Added
Matern12covariance function for Gaussian processes. This is the Matern kernel with nu=1/2. - Progressbar reports number of divergences in real time, when available #3547.
- Sampling from variational approximation now allows for alternative trace backends [#3550].
- Infix
@operator now works with random variables and deterministics #3619. - ArviZ is now a requirement, and handles plotting, diagnostics, and statistical checks.
- Can use GaussianRandomWalk in sample_prior_predictive and sample_prior_predictive #3682
- Now 11 years of S&P returns in data set#3682
Maintenance
- Moved math operations out of
Rice,TruncatedNormal,TriangularandZeroInflatedNegativeBinomialrandommethods. Math operations on values returned bydraw_valuesmight not broadcast well, and all thesizeaware broadcasting is left togenerate_samples. Fixes #3481 and #3508 - Parallelization of population steppers (
DEMetropolis) is now set via thecoresargument. (#3559) - Fixed a bug in
Categorical.logp. In the case of multidimensionalp's, the indexing was done wrong leading to incorrectly shaped tensors that consumedO(n**2)memory instead ofO(n). This fixes issue #3535 - Fixed a defect in
OrderedLogistic.__init__that unnecessarily increased the dimensionality of the underlyingp. Related to issue issue #3535 but was not the true cause of it. - SMC: stabilize covariance matrix 3573
- SMC: is no longer a step method of
pm.samplenow it should be called usingpm.sample_smc3579 - SMC: improve computation of the proposal scaling factor 3594 and 3625
- SMC: reduce number of logp evaluations 3600
- SMC: remove
scalingandtune_scalingarguments as is a better idea to always allow SMC to automatically compute the scaling factor 3625 - Now uses
multiprocessongrather thanpsutilto count CPUs, which results in reliable core counts on Chromebooks. sample_posterior_predictivenow preallocates the memory required for its output to improve memory usage. Addresses problems raised in this discourse thread.- Fixed a bug in
Categorical.logp. In the case of multidimensionalp's, the indexing was done wrong leading to incorrectly shaped tensors that consumedO(n**2)memory instead ofO(n). This fixes issue #3535 - Fixed a defect in
OrderedLogistic.__init__that unnecessarily increased the dimensionality of the underlyingp. Related to issue issue #3535 but was not the true cause of it. - Wrapped
DensityDist.randwithgenerate_samplesto make it aware of the distribution's shape. Added control flow attributes to still be able to behave as in earlier versions, and to control how to interpret thesizeparameter in therandomcallable signature. Fixes 3553 - Added
theano.gof.graph.Constantto type checks done in_draw_value(fixes issue 3595) HalfNormaldid not used to work properly indraw_values,sample_prior_predictive, orsample_posterior_predictive(fixes issue 3686)- Random variable transforms were inadvertently left out of the API documentation. Added them. (See PR 3690).
PyMC3 3.7 (May 29 2019)
New features
- Add data container class (
Data) that wraps the theano SharedVariable class and let the model be aware of its inputs and outputs. - Add function
set_datato update variables defined asData. Mixturenow supports mixtures of multidimensional probability distributions, not just lists of 1D distributions.GLM.from_formulaandLinearComponent.from_formulacan extract variables from the calling scope. Customizable via the neweval_envargument. Fixing #3382.- Added the
distributions.shape_utilsmodule with functions used to help broadcast samples drawn from distributions using thesizekeyword argument. - Used
numpy.vectorizeindistributions.distribution._compile_theano_function. This enablessample_prior_predictiveandsample_posterior_predictiveto ask for tuples of samples instead of just integers. This fixes issue #3422.
Maintenance
- All occurances of
sdas a parameter name have been renamed tosigma.sdwill continue to function for backwards compatibility. HamiltonianMCwas ignoring certain arguments liketarget_accept, and not using the custom step size jitter function with expectation 1.- Made
BrokenPipeErrorfor parallel sampling more verbose on Windows. - Added the
broadcast_distribution_samplesfunction that helps broadcasting arrays of drawn samples, taking into account the requestedsizeand the inferred distribution shape. This sometimes is needed by distributions that call severalrvsseparately within theirrandommethod, such as theZeroInflatedPoisson(fixes issue #3310). - The
Wald,Kumaraswamy,LogNormal,Pareto,Cauchy,HalfCauchy,WeibullandExGaussiandistributionsrandommethod used a hidden_randomfunction that was written with scalars in mind. This could potentially lead to artificial correlations between random draws. Added shape guards and broadcasting of the distribution samples to prevent this (Similar to issue #3310). - Added a fix to allow the imputation of single missing values of observed data, which previously would fail (fixes issue #3122).
- The
draw_valuesfunction was too permissive with what could be grabbed from insidepoint, which lead to an error when sampling posterior predictives of variables that depended on shared variables that had changed their shape afterpm.sample()had been called (fix issue #3346). draw_valuesnow adds the theano graph descendants ofTensorConstantorSharedVariablesto the named relationship nodes stack, only if these descendants areObservedRVorMultiObservedRVinstances (fixes issue #3354).- Fixed bug in broadcast_distrution_samples, which did not handle correctly cases in which some samples did not have the size tuple prepended.
- Changed
MvNormal.random's usage oftensordotfor Cholesky encoded covariances. This lead to wrong axis broadcasting and seemed to be the cause for issue #3343. - Fixed defect in
Mixture.randomwhen multidimensional mixtures were involved. The mixture component was not preserved across all the elements of the dimensions of the mixture. This meant that the correlations across elements within a given draw of the mixture were partly broken. - Restructured
Mixture.randomto allow better use of vectorized calls tocomp_dists.random. - Added tests for mixtures of multidimensional distributions to the test suite.
- Fixed incorrect usage of
broadcast_distribution_samplesinDiscreteWeibull. Mixture's default dtype is now determined bytheano.config.floatX.dist_math.random_choicenow handles nd-arrays of category probabilities, and also handles sizes that are notNone. Also removed unusedkkwarg fromdist_math.random_choice.- Changed
Categorical.modeto preserve all the dimensions ofpexcept the last one, which encodes each category's probability. - Changed initialization of
Categorical.p.pis now normalized to sum to1insidelogpandrandom, but not during initialization. This could hide negative values supplied topas mentioned in #2082. Categoricalnow accepts elements ofpequal to0.logpwill return-infif there arevaluesthat index to the zero probability categories.- Add
sigma,tau, andsdto signature ofNormalMixture. - Set default lower and upper values of -inf and inf for pm.distributions.continuous.TruncatedNormal. This avoids errors caused by their previous values of None (fixes issue #3248).
- Converted all calls to
pm.distributions.bound._ContinuousBoundedandpm.distributions.bound._DiscreteBoundedto use only and all positional arguments (fixes issue #3399). - Restructured
distributions.distribution.generate_samplesto use theshape_utilsmodule. This solves issues #3421 and #3147 by using thesizeaware broadcating functions inshape_utils. - Fixed the
Multinomial.randomandMultinomial.random_methods to make them compatible with the newgenerate_samplesfunction. In the process, a bug of theMultinomial.random_shape handling was discovered and fixed. - Fixed a defect found in
Bound.randomwhere thepointdictionary was passed togenerate_samplesas anarginstead of innot_broadcast_kwargs. - Fixed a defect found in
Bound.random_wheretotal_sizecould end up as afloat64instead of being an integer if givensize=tuple(). - Fixed an issue in
model_graphthat caused construction of the graph of the model for rendering to hang: replaced a search over the powerset of the nodes with a breadth-first search over the nodes. Fix for #3458. - Removed variable annotations from
model_graphbut left type hints (Fix for #3465). This means that we supportpython>=3.5.4. - Default
target_acceptforHamiltonianMCis now 0.65, as suggested in Beskos et. al. 2010 and Neal 2001. - Fixed bug in
draw_valuesthat lead to intermittent errors in python3.5. This happened with some deterministic nodes that were drawn but not added togivens.
Deprecations
nuts_kwargsandstep_kwargshave been deprecated in favor of using the standardkwargsto pass optional step method arguments.SGFSandCSGhave been removed (Fix for #3353). They have been moved to pymc3-experimental.- References to
live_plotand corresponding notebooks have been removed. - Function
approx_hessianwas removed, due tonumdifftoolsbecoming incompatible with currentscipy. The function was already optional, only available to a user who installednumdifftoolsseparately, and not hit on any common codepaths. #3485. - Deprecated
varsparameter ofsample_posterior_predictivein favor ofvarnames. - References to
live_plotand corresponding notebooks have been removed. - Deprecated
varsparameters ofsample_posterior_predictiveandsample_prior_predictivein favor ofvar_names. At least for the latter, this is more accurate, since thevarsparameter actually took names.
Contributors sorted by number of commits
45 Luciano Paz
38 Thomas Wiecki
23 Colin Carroll
19 Junpeng Lao
15 Chris Fonnesbeck
13 Juan Martín Loyola
13 Ravin Kumar
8 Robert P. Goldman
5 Tim Blazina
4 chang111
4 adamboche
3 Eric Ma
3 Osvaldo Martin
3 Sanmitra Ghosh
3 Saurav Shekhar
3 chartl
3 fredcallaway
3 Demetri
2 Daisuke Kondo
2 David Brochart
2 George Ho
2 Vaibhav Sinha
1 rpgoldman
1 Adel Tomilova
1 Adriaan van der Graaf
1 Bas Nijholt
1 Benjamin Wild
1 Brigitta Sipocz
1 Daniel Emaasit
1 Hari
1 Jeroen
1 Joseph Willard
1 Juan Martin Loyola
1 Katrin Leinweber
1 Lisa Martin
1 M. Domenzain
1 Matt Pitkin
1 Peadar Coyle
1 Rupal Sharma
1 Tom Gilliss
1 changjiangeng
1 michaelosthege
1 monsta
1 579397
v3.6
This is a major new release from 3.5 with many new features and important bugfixes. The highlight is certainly our completely revamped website: https://docs.pymc.io/
Note also, that this release will be the last to be compatible with Python 2. Thanks to all contributors!
New features
- Track the model log-likelihood as a sampler stat for NUTS and HMC samplers
(accessible astrace.get_sampler_stats('model_logp')) (#3134) - Add Incomplete Beta function
incomplete_beta(a, b, value) - Add log CDF functions to continuous distributions:
Beta,Cauchy,ExGaussian,Exponential,Flat,Gumbel,HalfCauchy,HalfFlat,HalfNormal,Laplace,Logistic,Lognormal,Normal,Pareto,StudentT,Triangular,Uniform,Wald,Weibull. - Behavior of
sample_posterior_predictiveis now to produce posterior predictive samples, in order, from all values of thetrace. Previously, by default it would produce 1 chain worth of samples, using a random selection from thetrace(#3212) - Show diagnostics for initial energy errors in HMC and NUTS.
- PR #3273 has added the
distributions.distribution._DrawValuesContextcontext
manager. This is used to store the values already drawn in nestedrandom
anddraw_valuescalls, enablingdraw_valuesto draw samples from the
joint probability distribution of RVs and not the marginals. Custom
distributions that must calldraw_valuesseveral times in theirrandom
method, or that invoke many calls to other distribution'srandommethods
(e.g. mixtures) must do all of these calls under the same_DrawValuesContext
context manager instance. If they do not, the conditional relations between
the distribution's parameters could be broken, andrandomcould return
values drawn from an incorrect distribution. Ricedistribution is now defined with either the noncentrality parameter or the shape parameter (#3287).
Maintenance
- Big rewrite of documentation (#3275)
- Fixed Triangular distribution
cattribute handling inrandomand updated sample codes for consistency (#3225) - Refactor SMC and properly compute marginal likelihood (#3124)
- Removed use of deprecated
yminkeyword in matplotlib'sAxes.set_ylim(#3279) - Fix for #3210. Now
distribution.draw_values(params), will draw theparamsvalues from their joint probability distribution and not from combinations of their marginals (Refer to PR #3273). - Removed dependence on pandas-datareader for retrieving Yahoo Finance data in examples (#3262)
- Rewrote
Multinomial._randommethod to better handle shape broadcasting (#3271) - Fixed
Ricedistribution, which inconsistently mixed two parametrizations (#3286). Ricedistribution now accepts multiple parameters and observations and is usable with NUTS (#3289).sample_posterior_predictiveno longer callsdraw_valuesto initialize the shape of the ppc trace. This called could lead toValueError's when sampling the ppc from a model withFlatorHalfFlatprior distributions (Fix issue #3294).
Deprecations
- Renamed
sample_ppc()andsample_ppc_w()tosample_posterior_predictive()andsample_posterior_predictive_w(), respectively.
v3.5 Final
New features
- Add documentation section on survival analysis and censored data models
- Add
check_test_pointmethod topm.Model - Add
OrderedTransformation andOrderedLogisticdistribution - Add
Chaintransformation - Improve error message
Mass matrix contains zeros on the diagonal. Some derivatives might always be zeroduring tuning ofpm.sample - Improve error message
NaN occurred in optimization.during ADVI - Save and load traces without
pickleusingpm.save_traceandpm.load_trace - Add
Kumaraswamydistribution - Add
TruncatedNormaldistribution - Rewrite parallel sampling of multiple chains on py3. This resolves
long standing issues when transferring large traces to the main process,
avoids pickling issues on UNIX, and allows us to show a progress bar
for all chains. If parallel sampling is interrupted, we now return
partial results. - Add
sample_prior_predictivewhich allows for efficient sampling from
the unconditioned model. - SMC: remove experimental warning, allow sampling using
sample, reduce autocorrelation from
final trace. - Add
model_to_graphviz(which uses the optional dependencygraphviz) to
plot a directed graph of a PyMC3 model using plate notation. - Add beta-ELBO variational inference as in beta-VAE model (Christopher P. Burgess et al. NIPS, 2017)
- Add
__dir__toSingleGroupApproximationto improve autocompletion in interactive environments
Fixes
- Fixed grammar in divergence warning, previously
There were 1 divergences ...could be raised. - Fixed
KeyErrorraised when only subset of variables are specified to be recorded in the trace. - Removed unused
repeat=Nonearguments from allrandom()methods in distributions. - Deprecated the
sigmaargument inMarginalSparse.marginal_likelihoodin favor ofnoise - Fixed unexpected behavior in
random. Now therandomfunctionality is more robust and will work better forsample_priorwhen that is implemented. - Fixed
scale_cost_to_minibatchbehaviour, previously this was not working and alwaysFalse
v3.4.1 Final
There was no 3.4 release due to a naming issue on PyPI.
New features
- Add
logit_pkeyword topm.Bernoulli, so that users can specify the logit of the success probability. This is faster and more stable than usingp=tt.nnet.sigmoid(logit_p). - Add
randomkeyword topm.DensityDistthus enabling users to pass custom random method which in turn makes sampling from aDensityDistpossible. - Effective sample size computation is updated. The estimation uses Geyer's initial positive sequence, which no longer truncates the autocorrelation series inaccurately.
pm.diagnostics.effective_nnow can reports N_eff>N. - Added
KroneckerNormaldistribution and a correspondingMarginalKron
Gaussian Process implementation for efficient inference, along with
lower-level functions such ascartesianandkroneckerproducts. - Added
Coregioncovariance function. - Add new 'pairplot' function, for plotting scatter or hexbin matrices of sampled parameters.
Optionally it can plot divergences. - Plots of discrete distributions in the docstrings
- Add logitnormal distribution
- Densityplot: add support for discrete variables
- Fix the Binomial likelihood in
.glm.families.Binomial, with the flexibility of specifying then. - Add
offsetkwarg to.glm. - Changed the
comparefunction to accept a dictionary of model-trace pairs instead of two separate lists of models and traces. - add test and support for creating multivariate mixture and mixture of mixtures
distribution.draw_values, now is also able to draw values from conditionally dependent RVs, such as autotransformed RVs (Refer to PR #2902).
Fixes
VonMisesdoes not overflow for large values of kappa. i0 and i1 have been removed and we now use log_i0 to compute the logp.- The bandwidth for KDE plots is computed using a modified version of Scott's rule. The new version uses entropy instead of standard deviation. This works better for multimodal distributions. Functions using KDE plots has a new argument
bwcontrolling the bandwidth. - fix PyMC3 variable is not replaced if provided in more_replacements (#2890)
- Fix for issue #2900. For many situations, named node-inputs do not have a
randommethod, while some intermediate node may have it. This meant that if the named node-input at the leaf of the graph did not have a fixed value,theanowould try to compile it and fail to find inputs, raising atheano.gof.fg.MissingInputError. This was fixed by going through the theano variable's owner inputs graph, trying to get intermediate named-nodes values if the leafs had failed. - In
distribution.draw_values, some named nodes could betheano.tensor.TensorConstants ortheano.tensor.sharedvar.SharedVariables. Nevertheless, indistribution._draw_value, these would be passed todistribution._compile_theano_functionas if they weretheano.tensor.TensorVariables. This could lead to the following exceptionsTypeError: ('Constants not allowed in param list', ...)orTypeError: Cannot use a shared variable (...). The fix was to not addtheano.tensor.TensorConstantortheano.tensor.sharedvar.SharedVariablenamed nodes into thegivensdict that could be used indistribution._compile_theano_function. - Exponential support changed to include zero values.
Deprecations
- DIC and BPIC calculations have been removed
- df_summary have been removed, use summary instead
njobsandnchainskwarg are deprecated in favor ofcoresandchainsforsamplelagkwarg inpm.stats.autocorrandpm.stats.autocovis deprecated.
v3.3 Final
New features
- Improve NUTS initialization
advi+adapt_diag_gradand addjitter+adapt_diag_grad(#2643) - Added
MatrixNormalclass for representing vectors of multivariate normal variables - Implemented
HalfStudentTdistribution - New benchmark suite added (see http://pandas.pydata.org/speed/pymc3/)
- Generalized random seed types
- Update loo, new improved algorithm (#2730)
- New CSG (Constant Stochastic Gradient) approximate posterior sampling algorithm (#2544)
- Michael Osthege added support for population-samplers and implemented differential evolution metropolis (
DEMetropolis). For models with correlated dimensions that can not use gradient-based samplers, theDEMetropolissampler can give higher effective sampling rates. (also see PR#2735) - Forestplot supports multiple traces (#2736)
- Add new plot, densityplot (#2741)
- DIC and BPIC calculations have been deprecated
- Refactor HMC and implemented new warning system (#2677, #2808)
Fixes
- Fixed
compareplotto useloooutput. - Improved
posteriorplotto scale fonts sample_ppc_wnow broadcastsdf_summaryfunction renamed tosummary- Add test for
model.logp_arrayandmodel.bijection(#2724) - Fixed
sample_ppcandsample_ppc_wto iterate all chains(#2633, #2748) - Add Bayesian R2 score (for GLMs)
stats.r2_score(#2696) and test (#2729). - SMC works with transformed variables (#2755)
- Speedup OPVI (#2759)
- Multiple minor fixes and improvements in the docs (#2775, #2786, #2787, #2789, #2790, #2794, #2799, #2809)
Deprecations
- Old (
minibatch-)adviis removed (#2781)
v3.2 Final
- This version includes two major contributions from our Google Summer of Code 2017 students:
- Maxim Kochurov extended and refactored the variational inference module. This primarily adds two important classes, representing operator variational inference (
OPVI) objects andApproximationobjects. These make it easier to extend existingvariationalclasses, and to derive inference fromvariationaloptimizations, respectively. Thevariationalmodule now also includes normalizing flows (NFVI). - Bill Engels added an extensive new Gaussian processes (
gp) module. Standard GPs can be specified using eitherLatentorMarginalclasses, depending on the nature of the underlying function. A Student-T processTPhas been added. In order to accomodate larger datasets, approximate marginal Gaussian processes (MarginalSparse) have been added.
- Maxim Kochurov extended and refactored the variational inference module. This primarily adds two important classes, representing operator variational inference (
- Documentation has been improved as the result of the project's monthly "docathons".
- An experimental stochastic gradient Fisher scoring (
SGFS) sampling step method has been added. - The API for
find_MAPwas enhanced. - SMC now estimates the marginal likelihood.
- Added
LogisticandHalfFlatdistributions to set of continuous distributions. - Bayesian fraction of missing information (
bfmi) function added tostats. - Enhancements to
compareplotadded. - QuadPotential adaptation has been implemented.
- Script added to build and deploy documentation.
- MAP estimates now available for transformed and non-transformed variables.
- The
Constantvariable class has been deprecated, and will be removed in 3.3. - DIC and BPIC calculations have been sped up.
- Arrays are now accepted as arguments for the
Boundclass. randommethod was added to theWishartandLKJCorrdistributions.- Progress bars have been added to LOO and WAIC calculations.
- All example notebooks updated to reflect changes in API since 3.1.
- Parts of the test suite have been refactored.
Fixes
- Fixed sampler stats error in NUTS for non-RAM backends
- Matplotlib is no longer a hard dependency, making it easier to use in settings where installing Matplotlib is problematic. PyMC will only complain if plotting is attempted.
- Several bugs in the Gaussian process covariance were fixed.
- All chains are now used to calculate WAIC and LOO.
- AR(1) log-likelihood function has been fixed.
- Slice sampler fixed to sample from 1D conditionals.
- Several docstring fixes.
v3.1 Final
This is the first major update to PyMC 3 since its initial release. Highlights of this release include:
- Gaussian Process submodule
- Much improved variational inference support that includes:
- Stein Variational Gradient Descent
- Minibatch processing
- Additional optimizers, including ADAM
- Experimental operational variational inference (OPVI)
- Full-rank ADVI
- MvNormal supports Cholesky Decomposition now for increased speed and numerical stability.
- NUTS implementation now matches current Stan implementation.
- Higher-order integrators for HMC
- Elliptical slice sampler is now available
- Added
Approximationclass and the ability to convert a sampled trace into an approximation via itsEmpiricalsubclass. - Add MvGaussianRandomWalk and MvStudentTRandomWalk distributions.
v3.0 Final
This is the first major release of PyMC3. A number of major changes since splitting from the PyMC2 project include:
- Added gradient-based MCMC samplers: Hamiltonian MC (
HMC) and No-U-Turn Sampler (NUTS) - Automatic gradient calculations using Theano
- Convenient generalized linear model specification using Patsy formulae
- Parallel sampling via
multiprocessing - New model specification using context managers
- New Automatic Differentiation Variational InferenceAVDI (
ADVI) allowing faster sampling thanHMCfor some problems. - Mini-batch ADVI
v3.0 Release Candidate 6
Sixth release candidate of PyMC3 3.0.