Skip to content

Commit 85eca8a

Browse files
committed
new syntax
1 parent 1b9eb60 commit 85eca8a

File tree

9 files changed

+138
-191
lines changed

9 files changed

+138
-191
lines changed

.travis.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,14 @@ language: julia
22
julia:
33
- 1.1
44
- nightly
5+
script:
6+
- if [[ -a .git/shallow ]]; then git fetch --unshallow; fi
7+
- julia --project -e 'using Pkg; Pkg.add(PackageSpec(name ="FixedEffects", rev="master")) ; Pkg.add(PackageSpec(name ="FixedEffectModels", rev="master")) ; Pkg.build(); Pkg.test(; coverage=true)';
58
matrix:
69
allow_failures:
710
- julia: nightly
811
after_success:
912
- julia -e 'using Pkg; Pkg.add("Coverage"); using Coverage; Coveralls.submit(Coveralls.process_folder())'
1013
notifications:
1114
on_success: never
12-
on_failure: change
15+
on_failure: change

Project.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
name = "InteractiveFixedEffectModels"
22
uuid = "80307280-efb2-5c5d-af8b-a9c15821677b"
3-
version = "0.4.3"
3+
version = "0.5.0"
44

55
[deps]
66
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
@@ -20,7 +20,7 @@ Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
2020

2121
[compat]
2222
DataFrames = "≥ 0.19.1"
23-
FixedEffectModels = "≥ 0.8.0"
23+
FixedEffectModels = "≥ 0.9.0"
2424
FixedEffects = "≥ 0.3.0"
2525
LeastSquaresOptim = "≥ 0.7.0"
2626
StatsBase = "≥ 0.22.0"

README.md

Lines changed: 17 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,7 @@ To estimate an interactive fixed effect model, one needs to specify a formula, a
1717
```julia
1818
using DataFrames, RDatasets, InteractiveFixedEffectModels
1919
df = dataset("plm", "Cigar")
20-
df.pState = categorical(df.State)
21-
df.pYear = categorical(df.Year)
22-
regife(df, @model(Sales ~ Price, ife = (pState + pYear, 2), fe = pState), save = true)
20+
regife(df, @model(Sales ~ Price + fe(State) + ife(State, Year, 2)), save = true)
2321
# Linear Factor Model
2422
#================================================================
2523
#Number of obs: 1380 Degree of freedom: 199
@@ -39,45 +37,37 @@ regife(df, @model(Sales ~ Price, ife = (pState + pYear, 2), fe = pState), save =
3937
df = dataset("plm", "Cigar")
4038
```
4139
42-
When the only regressor is `0`, `fit` fits a factor model
40+
When the only regressor is `0`, `regife` fits a factor model
4341
```julia
4442
Sales ~ 0
4543
```
4644
47-
Otherwise, `fit` fits a linear model with interactive fixed effects (Bai (2009))
45+
Otherwise, `regife` fits a linear model with interactive fixed effects (Bai (2009))
4846
```julia
4947
Sales ~ Price + Year
5048
```
51-
- Interactive fixed effects are indicated with the keyword argument `ife`. Variables must be of type `PooledDataVector`. The rank is the number of components to use. facFor instance, for a factor model with id variable `State`, time variable `Year`, and rank `r` equal to 2:
49+
50+
Interactive fixed effects are indicated with the function `ife`. For instance, to specify a factor model with id variable `State`, time variable `Year`, and rank 2, use `ife(State, Year, 2)`.
5251
53-
```julia
54-
df.pState = categorical(df.State)
55-
df.pYear = categorical(df.Year)
56-
ife = (pState + pYear, 2)
57-
```
52+
High-dimensional Fixed effects can be used, as in `fe(State)` but only for the variables specified in the factor model. See [FixedEffectModels.jl](https://github.com/matthieugomez/FixedEffectModels.jl) for more information
5853
59-
- Fixed effects are indicated with the keyword argument `fe`. Use only the variables specified in the factor model. See [FixedEffectModels.jl](https://github.com/matthieugomez/FixedEffectModels.jl) for more information
60-
61-
```julia
62-
fe = pState
63-
fe = pYear
64-
fe = pState + pYear
65-
```
6654
6755
- Standard errors are indicated with the keyword argument `vcov`
6856
```julia
6957
vcov = robust()
70-
vcov = cluster(StatePooled)
71-
vcov = cluster(StatePooled, YearPooled)
58+
vcov = cluster(State)
59+
vcov = cluster(State, Year)
7260
```
7361
74-
- weights are indicated with the keyword argument `weights`
75-
```julia
76-
weights = Pop
77-
```
62+
- The option `weights` can add weights
63+
64+
```julia
65+
weights = :Pop
66+
```
67+
7868
- The option `method` can be used to choose between two algorithms:
79-
- `levenberg_marquardt`
80-
- `dogleg`
69+
- `:levenberg_marquardt`
70+
- `:dogleg`
8171
- The option `save = true` saves a new dataframe storing residuals, factors, loadings and the eventual fixed effects. Importantly, the returned dataframe is aligned with the initial dataframe (rows not used in the estimation are simply filled with NA).
8272
8373
@@ -95,9 +85,7 @@ Yes. Factor models are a particular case of interactive fixed effect models. Sim
9585
```julia
9686
using DataFrames, RDatasets, InteractiveFixedEffectModels
9787
df = dataset("plm", "Cigar")
98-
df.pState = categorical(df.State)
99-
df.pYear = categorical(df.Year)
100-
regife(df, @model(Sales ~ 0, ife = (pState + pYear, 2), fe = pState), save = true)
88+
regife(df, @model(Sales ~ 0 + ife(State, Year, 2) + fe(State)), save = true)
10189
```
10290
Compared to the usual SVD method, the package estimates models with multiple (or missing) observations per id x time.
10391

src/InteractiveFixedEffectModels.jl

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,19 +20,22 @@ import LeastSquaresOptim
2020
using FillArrays
2121
using Distributions
2222
using Reexport
23-
import StatsModels: @formula, FormulaTerm, Term, InteractionTerm, ConstantTerm, MatrixTerm, AbstractTerm, coefnames, columntable, missing_omit, termvars, schema, apply_schema, modelmatrix, response, terms
23+
import StatsModels: @formula, FormulaTerm, Term, InteractionTerm, ConstantTerm, MatrixTerm, AbstractTerm, coefnames, columntable, missing_omit, termvars, schema, apply_schema, modelmatrix, response, terms, FunctionTerm
24+
using FixedEffects
2425
@reexport using FixedEffectModels
2526

26-
27+
if !isdefined(FixedEffects, :AbstractFixedEffectSolver)
28+
AbstractFixedEffectSolver{T} = AbstractFixedEffectMatrix{T}
29+
end
2730

2831
##############################################################################
2932
##
3033
## Exported methods and types
3134
##
3235
##############################################################################
3336

34-
export InteractiveFixedEffectFormula,
35-
InteractiveFixedEffectResult,
37+
export InteractiveFixedEffectModel,
38+
ife,
3639
regife
3740
##############################################################################
3841
##
@@ -41,11 +44,9 @@ regife
4144
##############################################################################
4245
include("types.jl")
4346

44-
include("utils/models.jl")
45-
4647
include("methods/gauss_seidel.jl")
4748
include("methods/ls.jl")
4849

49-
include("regife.jl")
50+
include("fit.jl")
5051

5152
end

src/regife.jl renamed to src/fit.jl

Lines changed: 50 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,18 @@
44
## Fit is the only exported function
55
##
66
##############################################################################
7+
function StatsBase.fit(::Type{InteractiveFixedEffectModel}, m::Model, df::AbstractDataFrame; kwargs...)
8+
regife(m, df; kwargs...)
9+
end
710

811

912
function regife(df::AbstractDataFrame, m::Model; kwargs...)
1013
regife(df, m.f; m.dict..., kwargs...)
1114
end
1215

1316
function regife(df::AbstractDataFrame, f::FormulaTerm;
14-
ife::Union{Symbol, Expr, Nothing} = nothing,
15-
fe::Union{Symbol, Expr, Nothing} = nothing,
17+
feformula::Union{Symbol, Expr, Nothing} = nothing,
18+
ifeformula::Union{Symbol, Expr, Nothing} = nothing,
1619
vcov::Union{Symbol, Expr, Nothing} = :(simple()),
1720
weights::Union{Symbol, Expr, Nothing} = nothing,
1821
subset::Union{Symbol, Expr, Nothing} = nothing,
@@ -34,33 +37,36 @@ function regife(df::AbstractDataFrame, f::FormulaTerm;
3437
else
3538
vcovformula = VcovFormula(Val{vcov.args[1]}, (vcov.args[i] for i in 2:length(vcov.args))...)
3639
end
37-
m = InteractiveFixedEffectFormula(ife)
3840

3941
if (ConstantTerm(0) FixedEffectModels.eachterm(f.rhs)) & (ConstantTerm(1) FixedEffectModels.eachterm(f.rhs))
40-
f = FormulaTerm(f.lhs, tuple(ConstantTerm(1), FixedEffectModels.eachterm(f.rhs)...))
42+
formula = FormulaTerm(f.lhs, tuple(ConstantTerm(1), FixedEffectModels.eachterm(f.rhs)...))
4143
end
44+
4245
formula, formula_endo, formula_iv = FixedEffectModels.decompose_iv(f)
46+
47+
m, formula = parse_interactivefixedeffect(df, formula)
48+
if ifeformula != nothing # remove after depreciation
49+
m = OldInteractiveFixedEffectFormula(ifeformula)
50+
end
51+
4352
## parse formula
4453
if formula_iv != nothing
4554
error("partial_out does not support instrumental variables")
4655
end
47-
has_absorb = fe != nothing
4856
has_weights = (weights != nothing)
4957

5058

5159
## create a dataframe without missing values & negative weightss
5260
vars = allvars(formula)
53-
absorb_vars = allvars(fe)
61+
if feformula != nothing # remove after depreciation
62+
vars = vcat(vars, allvars(feformula))
63+
end
5464
vcov_vars = allvars(vcovformula)
5565
factor_vars = vcat(allvars(m.id), allvars(m.time))
56-
rem = setdiff(absorb_vars, factor_vars)
57-
if length(rem) > 0
58-
error("The categorical variable $(rem[1]) appears in @fe but does not appear in @ife. Simply add it in @formula instead")
59-
end
60-
all_vars = unique(vcat(vars, absorb_vars, factor_vars, vcov_vars))
66+
all_vars = unique(vcat(vars, factor_vars, vcov_vars))
6167
esample = completecases(df[!, all_vars])
6268
if has_weights
63-
esample .&= isnaorneg(df[!, weights])
69+
esample .&= FixedEffectModels.isnaorneg(df[!, weights])
6470
all_vars = unique(vcat(all_vars, weights))
6571
end
6672
if subset != nothing
@@ -77,41 +83,43 @@ function regife(df::AbstractDataFrame, f::FormulaTerm;
7783
vcov_method_data = VcovMethod(df[esample, unique(Symbol.(vcov_vars))], vcovformula)
7884

7985
# Compute weights
80-
sqrtw = get_weights(df, esample, weights)
81-
86+
sqrtw = Ones{Float64}(sum(esample))
87+
if has_weights
88+
sqrtw = convert(Vector{Float64}, sqrt.(view(df, esample, weights)))
89+
end
90+
for a in FixedEffectModels.eachterm(formula.rhs)
91+
if has_fe(a)
92+
isa(a, InteractionTerm) && error("Fixed effects cannot be interacted")
93+
Symbol(first(a.args_parsed)) factor_vars && error("FixedEffect should correspond to id or time dimension of the factor model")
94+
end
95+
end
96+
fes, ids, formula = FixedEffectModels.parse_fixedeffect(df, formula)
97+
if feformula != nothing # remove after depreciation
98+
feformula = @eval(@formula(0 ~ $(feformula)))
99+
fes, ids = FixedEffectModels.oldparse_fixedeffect(df, feformula)
100+
end
101+
has_fes = !isempty(fes)
102+
has_fes_intercept = false
82103
## Compute factors, an array of AbtractFixedEffects
83-
if has_absorb
84-
feformula = @eval(@formula(nothing ~ $(fe)))
85-
fes, ids = FixedEffectModels.parse_fixedeffect(df, feformula)
104+
if has_fes
86105
if any([isa(fe.interaction, Ones) for fe in fes])
87106
formula = FormulaTerm(formula.lhs, tuple(ConstantTerm(0), (t for t in FixedEffectModels.eachterm(formula.rhs) if t!= ConstantTerm(1))...))
88-
has_absorb_intercept = true
107+
has_fes_intercept = true
89108
end
90109
fes = FixedEffect[FixedEffectModels._subset(fe, esample) for fe in fes]
91-
feM = FixedEffectModels.AbstractFixedEffectMatrix{Float64}(fes, sqrtw, Val{:lsmr})
110+
feM = FixedEffectModels.AbstractFixedEffectSolver{Float64}(fes, sqrtw, Val{:lsmr})
92111
end
93112

113+
94114
has_intercept = ConstantTerm(1) FixedEffectModels.eachterm(formula.rhs)
95115

96116

97117
iterations = 0
98118
converged = false
99119
# get two dimensions
100120

101-
if isa(m.id, Symbol)
102-
# always factorize
103-
id = group(df[esample, m.id])
104-
else
105-
factorvars, interactionvars = _split(df, allvars(m.id))
106-
id = group((df[esample, v] for v in factorvars)...)
107-
end
108-
if isa(m.time, Symbol)
109-
# always factorize
110-
time = group(df[esample, m.time])
111-
else
112-
factorvars, interactionvars = _split(df, allvars(m.time))
113-
time = group((df[esample, v] for v in factorvars)...)
114-
end
121+
id = FixedEffects.group(df[esample, m.id])
122+
time = FixedEffects.group(df[esample, m.time])
115123

116124
##############################################################################
117125
##
@@ -144,7 +152,7 @@ function regife(df::AbstractDataFrame, f::FormulaTerm;
144152

145153

146154

147-
if has_absorb
155+
if has_fes
148156
FixedEffectModels.solve_residuals!(y, feM)
149157
FixedEffectModels.solve_residuals!(X, feM)
150158
end
@@ -191,7 +199,7 @@ function regife(df::AbstractDataFrame, f::FormulaTerm;
191199
# y ~ x + γ1 x factors + γ2 x loadings
192200
# if not, this means fit! ended up on a a local minimum.
193201
# restart with randomized coefficients, factors, loadings
194-
newfeM = FixedEffectModels.AbstractFixedEffectMatrix{Float64}(getfactors(fp, fs), sqrtw, Val{:lsmr})
202+
newfeM = FixedEffectModels.AbstractFixedEffectSolver{Float64}(getfactors(fp, fs), sqrtw, Val{:lsmr})
195203
ym .= ym ./sqrtw
196204
FixedEffectModels.solve_residuals!(ym, newfeM, tol = tol, maxiter = maxiter)
197205
ym .= ym .* sqrtw
@@ -236,7 +244,7 @@ function regife(df::AbstractDataFrame, f::FormulaTerm;
236244
crossxm = cholesky!(Symmetric(Xm' * Xm))
237245
## compute the right degree of freedom
238246
df_absorb_fe = 0
239-
if has_absorb
247+
if has_fes
240248
for fe in fes
241249
df_absorb_fe += length(unique(fe.refs))
242250
end
@@ -249,13 +257,13 @@ function regife(df::AbstractDataFrame, f::FormulaTerm;
249257
# compute various r2
250258
nobs = sum(esample)
251259
rss = sum(abs2, residualsm)
252-
tss = compute_tss(ym, has_intercept || has_absorb_intercept, sqrtw)
253-
r2_within = 1 - rss / tss
260+
_tss = FixedEffectModels.tss(ym, has_intercept || has_fes_intercept, sqrtw)
261+
r2_within = 1 - rss / _tss
254262

255263
rss = sum(abs2, residuals)
256-
tss = compute_tss(oldy, has_intercept || has_absorb_intercept, sqrtw)
257-
r2 = 1 - rss / tss
258-
r2_a = 1 - rss / tss * (nobs - has_intercept) / dof_residual
264+
_tss = FixedEffectModels.tss(oldy, has_intercept || has_fes_intercept, sqrtw)
265+
r2 = 1 - rss / _tss
266+
r2_a = 1 - rss / _tss * (nobs - has_intercept) / dof_residual
259267
end
260268

261269
##############################################################################
@@ -277,7 +285,7 @@ function regife(df::AbstractDataFrame, f::FormulaTerm;
277285
end
278286

279287
# save fixed effects in a dataframe
280-
if has_absorb
288+
if has_fes
281289
# residual before demeaning
282290
oldresiduals = convert(Vector{Float64}, response(formula_schema, subdf))
283291
oldresiduals .= oldresiduals .* sqrtw

0 commit comments

Comments
 (0)