[Not complete yet]
Paper summary: Receding Horizon Planning with Rule Hierarchies for Autonomous Vehicles - 13 Dec 2023
Why?
Rule hierarchies can enhance interpretability, but introduce combinatorial complexity to planning; on the other hand, differentiable reward structures can be leveraged by modern gradient-based optimization tools, but are less interpretable and unintuitive to tune.
1. Rule hierarchies
| Pros | Cons |
|—|—|
| 1. They provide a systematic approach to plan motions that prioritize more important rules (e.g., safety) over the less important ones (e.g., comfort) if all rules cannot be simultaneously satisfied
2. Greater transparency in planner design
3. More amenable to introspection | 1. Rule hierarchies introduce significant combinatorial complexity to planning
2. Sometimes challenging to implement |
2. Differential reward structures
| Pros | Cons |
|—|—|
| 1. Less computation time from the reward optimization scheme
2. Relatively easier to implement | 1. Not intuitive to tune and difficult to interpret the planner’s performance |
In the worst case, type 1 takes up to $2\^N$ time complexity for N rules (?). This paper presents an approach to equivalently express rule hierarchies as differentiable reward structures amenable to modern gradient-based optimizers, thereby, achieving the best of both worlds.
- Good interpretation from ranking scheme: Expressive power of hierarchies
- Less computation time from reward optimization scheme Computational advantages of gradient-based optimization through modern optimization tools Hierarchy-preserving reward function that allows us to plan without checking all combinations of rule satisfaction.
How?
A. Problem formulation
-
Dynamics:
x
- ego state
u
- control inputs
t
- discrete-time -
Non-ego agents and map
S
- space of trajectories
T
- time horizon
$x_{ne}$ - non-ego states
$x_{map}$ - world map (lane lines, stop signs, etc)
w
= ($x_{ne}$, $x_{map}$) - world scene
In practice, the planner receives a predicted world scene that is generated from maps and prediction modules. (TODO IMAGE)
Assume that this scene is provided to the planner.
- Rules and rule robustness
- Rule ($\phi$) - a boolean function that maps ego state and world scene to True or False
- $\phi$:
S
xW
→ {True
,False
} - $\phi$ =
1
if rule satisfies,0
otherwise
- $\phi$:
- Robustness ($\rho$) - a metric that provides the degree of satisfaction of the rule
- Robustness of a rule ($\phi_i$), $\rho_i$:
S
×W
→ R - Positive scalar if the rule is satisfied and negative otherwise.
- More positive values indicate greater satisfaction of the rule while more negative values indicate greater violation. Source
- Robustness of a rule ($\phi_i$), $\rho_i$:
- Rules → expressed as Signal Temporal Logic (STL) formulae using STLCG (STL Computational Graphs), which comes equipped with back propagatable robustness metrics
- Expressing rules as STL formulae allow us to easily encode complex spatiotemporal specifications, such as maintaining a speed limit of
40mph
fromt+a
tot+b
3. Rule hierarchy ($\varphi$) - sequence of rules in precedence
1
: Highest priorityN
: Lowest priority- For a trajectory, the robustness ($\rho$) of a rule hierarchy ($\varphi$) is an
N
-dimensional vector-valued function, whereN
is the number of rules.
4. Rank of a trajectory(r)
An illustration of trajectory ranks for three rules
- Objective: Obtain control inputs that result in a trajectory with the highest achievable rank in accordance with a rule hierarchy:
- This is a single optimization that replaces checking $2\^N$ combinations of rule satisfaction.
- Expressing rules as STL formulae allow us to easily encode complex spatiotemporal specifications, such as maintaining a speed limit of
- Rule ($\phi$) - a boolean function that maps ego state and world scene to True or False
B. Rank-preserving reward function
To optimize the above equation, the paper presents a differentiable rank-preserving reward function.
Monotonicity:
(You can find proof in the appendix section of the paper)
- The intuition behind the above equation construction
- Ensure that the reward contribution on satisfaction of rule i should exceed the sum of the reward contributions by all rules with lower priority
- This is achieved by multiplying the step function with a constant that grows exponentially with the priority of a rule
- Average robustness to break the ties between the same-ranked trajectories
- To understand the above equation more intuitively, look at the following plot.
C. Receding horizon planning with rule hierarchies
Now that, we have a method to cast rule hierarchy into a nonlinear differentiable reward function.
A two-stage algorithm to solve (3).
Stage 1: Planning with motion primitives | Stage 2: Continuous trajectory optimization | |
---|---|---|
Objective | To generate a coarse initial trajectory for warm-starting the continuous optimizer in the second stage | To refine the trajectory obtained from the first stage by solving * You could further maximize the reward by still complying with rule hierarchy Ex: Safety improvement using STL robustness |
Process | * Choose a set M of open-loop controls that coarsely cover a variety of actions that the AV can take; e.g., (accelerate, turn right) , (decelerate, keep straight) , etc. * Parallelized process * Select the trajectory with the largest rule-hierarchy reward from sampled primitives |
* Compute the gradients analytically using STLCG and refine the controls |
Pseudo code
Signal Temporal Logic
Source - Autonomous Systems Laboratory
It is a formal language that converts specifications made in natural language to mathematical representation, that can be used in synthesis.
Term | Explanation |
---|---|
Logic | Made up of logical operations. For example AND, OR, NOT, etc. |
Temporal | These logical operations can be specified over time intervals |
Signal | We are reasoning about these logic and temporal operations over continuous real-valued time series such as a trajectory of the robot as it moves in time and space |
Boolean semantics
How to read? Signal at time t
satisfies the predicate $\mu$ if and only if predicate of state x
at a time t
is less than constant c
.
Example: Maintain speed of 0.5 m/s
between t+a
and t+b
Robustness
A measure of how much a signal satisfies or violates an STL formula
Quantitative semantics
STLCG
Signal Temporal Logic Computational Graphs - It is a toolbox to compute robustness and their gradients using STL formulae. Repository here.