What next for JEP-11 and beyond? #161
Replies: 1 comment
-
|
This is a sample use case to demonstrate JMESPath requirements. Let's start with a scenarios using a database and SQL statements. SQLhttps://learnsql.com/blog/count-join-sql/
Count all employees under each manager SELECT
sup.employee_id,
sup.first_name,
sup.last_name,
COUNT (sub.employee_id) AS number_of_employees
FROM employee sub
JOIN employee sup
ON sub.manager_id = sup.employee_id
GROUP BY sup.employee_id, sup.first_name, sup.last_name;Result:
JMESPathI would like to reproduce the preceding scenarios using JMESPath expressions and evaluate where there could be missing features and where features could be improved. Here is a given JSON document: [
{"employee_id": 4529, "first_name": "Nancy", "last_name": "Young", "manager_id": 4125},
{"employee_id": 4238, "first_name": "John", "last_name": "Simon", "manager_id": 4329},
{"employee_id": 4329, "first_name": "Martina", "last_name": "Candreva", "manager_id": 4125},
{"employee_id": 4009, "first_name": "Klaus", "last_name": "Koch", "manager_id": 4329},
{"employee_id": 4125, "first_name": "Mafalda", "last_name": "Ranieri", "manager_id": null},
{"employee_id": 4500, "first_name": "Jakub", "last_name": "Hrabal", "manager_id": 4529},
{"employee_id": 4118, "first_name": "Moira", "last_name": "Areas", "manager_id": 4952},
{"employee_id": 4012, "first_name": "Jon", "last_name": "Nilssen", "manager_id": 4952},
{"employee_id": 4952, "first_name": "Sandra", "last_name": "Rajkovic", "manager_id": 4529},
{"employee_id": 4444, "first_name": "Seamus", "last_name": "Quinn", "manager_id": 4329}
]Count all employees under each managerLet's pretend we want the following output: [
{"employee_id": 4125, "first_name": "Mafalda", "last_name": "Ranieri", "number_of_employees": 2 },
{"employee_id": 4329, "first_name": "Martina", "last_name": "Candreva", "number_of_employees": 3 },
{"employee_id": 4529, "first_name": "Nancy", "last_name": "Young", "number_of_employees": 2 },
{"employee_id": 4952, "first_name": "Sandra", "last_name": "Rajkovic", "number_of_employees": 2 },
]The following expression using the JEP-11 To be fair to the original JMESPath specification, we are using the Using my favored "JEP-11a" proposal, where looking up an identifier is made explicit using the Which maps nicely with James’ proposal: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Lexical Scoping (revisited)
After a few months with several·implementations·currently·running JEP-11 some concerns are being raised.
The main concern is around the notion of scopes that act as a fallback when evaluating identifiers to
null.In particular, the following expression is judged problematic:
search( let({qux: 'qux'}, &foo.qux ), {"foo": {"bar": "baz"}} )->"qux"The intuitive behaviour would be for
foo.quxto returnnull. However, as a fallback, thequxidentifieris not found in the execution context and looked up in the scope set as the first argument to thelet()function.This concern was actually raised early on in the design of JEP-11 by James and that may be one of the main reasons why JEP-11 was never officially accepted.
This concern is what prompted a new discussion and a potential new proposal to replace and improve on JEP-11.
This post is an attempt to summarize the concerns that we have with JEP-11 and try to list various alternatives with their pros and cons.
Glossary
First, let’s agree on common terms so that everyone discussing alternatives can be on the same page.
JEP-11 set the stage for some new terms:
execution contextorcontextfor short. This is the current structure being evaluated at each stage of the expression. When starting evaluation, the context is the original input JSON document.scope. This is a JSON object that is constructed as evaluating the first argument to thelet()function. JEP-11 even includes the notion of a stack of scopes that all participate when evaluating an identifier. If an identifier cannot be resolved from the currentcontext, it is looked up in thescopeand eachscope’s parentscopeuntil it is found. Otherwise, the evaluation returnsnull.This post is using those terms.
Main themes
At this stage, I think the consensus is that there should not exist an implicit lookup when resolving identifiers in a scope. Instead, evaluation should be explicitely specified using some sort of reference mechanism.
Other main themes are listed here:
The
scopein JEP-11 is an object. Some proposals argue that is could be any valid JSON token.The
scopeobjects in JEP-11 are organized in a stack, as nested expressions using thelet()function are created. This allowsidentifierevaluation to lookup the stack ofscopeobjects. There are some arguments to be made whether chaining should occur, or whetherscopesshould be isolated.The
scopestack in JEP-11 is available alongside the currentexecution context. A strict precedence is specified so that anidentifieris first looked up in the currentcontext, and then in thescopestack. There is an argument to be made whether thescopeshould replace the currentexecution contextentirely. For instance, this proposal mandates that the currentexecution contextis swapped with anothercontextusing a dedicated function.Finally, the way to surface this feature in JMESPath expressions must be discussed. @jamesls proposes a new
let <context> in <expression>construct, introducing keywords into the language, arguing that the semantics of thelet()function is distinctly unique in JMESPath and should be replaced with a more integrated mechanism. Other alternatives using distinct tokens could be devised if we do not want to introduce keywords.Let’s break down those main themes. Please feel free to include any that I may have missed in the comments.
Reference to identifiers
So a new mechanism must be implemented. Here are some alternatives:
$<identifier>: using the$sigil to reference identifiers from thescope.This is a common alternative to JEP-11 implicit lookup and is included in James’ proposal. This implicitly, only works, however, if the
scope– or each level in thescopestack – is an object. Thescopeitself is not accessible and cannot be acted upon¹.As far as I can tell, there is currently no proposal that promotes this behaviour. However, for the sake of completeness, this must be mentioned.
Using a function would require some level of indirection to access properties from a
scopeobject. As functions do not acceptidentifierarguments, araw-stringmust be used instead.get_from_scope('foo')However, using a function would pave the way to extended scenarios, such as accessing the
scopeobject itself, such as using thescopeas a – temporarily – input document for downstream expressions.with_scope().barScope object vs Scope value
This item is linked to the previous theme as it boils down to an alternative between accessing properties from an implicit
scopebut not thescopeitself, vs accessing the wholescopevalue which may be any valid JSON value.Should scopes be "chained"
As JMESPath expression can be nested, it seems natural to allow
scopeobjects to be nested. This allows identifiers from nested expressions to shadow identifiers from upper levers, as happens in many programming languages and local variables.However, some proposals sidestep this by mandating explicit usage of a
scopeusing dedicated constructs inside which a regular JMESPath expression applies.use_scope(&foo)This leads to question about what does
@stand for? What about$which is commonly used to refer to the "root" i.e the original input JSON document?Does the scope replaces / shadows the context
This proposal for instance, specifies a way to swap out the current
contextfor ascopewhich has been setup previously. [I renamed the function to make use of proper terms]The following expression:
with_scope({foo: 'bar'}, &…)Sets the
scopemuch like the first argument to thelet()function in JEP-11 does, except that it can be any valid JSON value rather than just an object.The the
expression-typein the second argument can take advantage of the following expression:…, &use_scope(&foo)At this point, the proposal mandates that inside the
use_scope()function, JMESPath expressions operate on what has been setup as thescopeas its input JSON document.This proposal mandates that at any point in time, only a single
execution contextexists, although as an expression author, you can make it change for another at any point.Exposing the feature to JMESPath using keywords vs tokens
James’s proposal is to introduce a new construct that I will refer to as the
let-expressionand goes like this:let <scope> in <expression>The
scopeis currently being proposed only as bindings to a variable but it explicitely uses the$sigil to refer to this variable in downstream expressions.Irrespective of the actual nature of the
scope, thislet-expressionis extremely similar in shape to thelet()function.For the sake of the discussion, let’s imagine there exists a proposal very similar to JEP-11. Let’s call it JEP-11a. In fact, it is JEP-11 with an added twist that references to scoped variables are explicit using the
$<identifier>varrefexpression proposed by James.I would argue that Jame’s
let-expressionis virtually identical in terms of feature set as JEP-11a. In fact, this would be my favored design as this point. James has concerns over the very use of a function to introduce this design, however.If we were to not use functions at all, I would personally favor pseudo-lambda syntax such as having the following grammar:
Using this design is, again, virtually identical to using JEP-11a or James’ updated proposal in terms of feature set.
So while not making the actual syntax secondary, I think this theme is worth deciding upon last, until we have figured out exactly what we want to support with respect to the following four previous themes introduced previously.
Footnotes
¹ The
$sigil is (commonly) used to refer to the original input document.Beta Was this translation helpful? Give feedback.
All reactions