Rule Hierarchies

[Not complete yet]

Paper summary: Receding Horizon Planning with Rule Hierarchies for Autonomous Vehicles - 13 Dec 2023

Why?

Rule hierarchies can enhance interpretability, but introduce combinatorial complexity to planning; on the other hand, differentiable reward structures can be leveraged by modern gradient-based optimization tools, but are less interpretable and unintuitive to tune.

1. Rule hierarchies

| Pros | Cons | |—|—| | 1. They provide a systematic approach to plan motions that prioritize more important rules (e.g., safety) over the less important ones (e.g., comfort) if all rules cannot be simultaneously satisfied
2. Greater transparency in planner design
3. More amenable to introspection | 1. Rule hierarchies introduce significant combinatorial complexity to planning
2. Sometimes challenging to implement |

2. Differential reward structures

| Pros | Cons | |—|—| | 1. Less computation time from the reward optimization scheme
2. Relatively easier to implement | 1. Not intuitive to tune and difficult to interpret the planner’s performance |

In the worst case, type 1 takes up to $2\^N$ time complexity for N rules (?). This paper presents an approach to equivalently express rule hierarchies as differentiable reward structures amenable to modern gradient-based optimizers, thereby, achieving the best of both worlds.

  • Good interpretation from ranking scheme: Expressive power of hierarchies
  • Less computation time from reward optimization scheme Computational advantages of gradient-based optimization through modern optimization tools Hierarchy-preserving reward function that allows us to plan without checking all combinations of rule satisfaction.

How?

A. Problem formulation

  • Dynamics:
    image
    x- ego state
    u - control inputs
    t - discrete-time

  • Non-ego agents and map
    S - space of trajectories
    T - time horizon
    $x_{ne}$ - non-ego states
    $x_{map}$ - world map (lane lines, stop signs, etc)
    w = ($x_{ne}$, $x_{map}$) - world scene

In practice, the planner receives a predicted world scene that is generated from maps and prediction modules. (TODO IMAGE)
Assume that this scene is provided to the planner.

  • Rules and rule robustness
    1. Rule ($\phi$) - a boolean function that maps ego state and world scene to True or False
      • $\phi$: S x W → {True, False}
      • $\phi$ = 1 if rule satisfies, 0 otherwise
    2. Robustness ($\rho$) - a metric that provides the degree of satisfaction of the rule
      • Robustness of a rule ($\phi_i$), $\rho_i$: S × W → R
      • Positive scalar if the rule is satisfied and negative otherwise.
      • More positive values indicate greater satisfaction of the rule while more negative values indicate greater violation. image Source
    • Rules → expressed as Signal Temporal Logic (STL) formulae using STLCG (STL Computational Graphs), which comes equipped with back propagatable robustness metrics
      • Expressing rules as STL formulae allow us to easily encode complex spatiotemporal specifications, such as maintaining a speed limit of 40mph from t+a to t+b 3. Rule hierarchy ($\varphi$) - sequence of rules in precedence
        image
      • 1: Highest priority
      • N: Lowest priority
      • For a trajectory, the robustness ($\rho$) of a rule hierarchy ($\varphi$) is an N-dimensional vector-valued function, where N is the number of rules.
        image
        4. Rank of a trajectory(r)
        image
        An illustration of trajectory ranks for three rules
        image
      • Objective: Obtain control inputs that result in a trajectory with the highest achievable rank in accordance with a rule hierarchy:
        image
        • This is a single optimization that replaces checking $2\^N$ combinations of rule satisfaction.

B. Rank-preserving reward function

To optimize the above equation, the paper presents a differentiable rank-preserving reward function.
Monotonicity:
image
image
(You can find proof in the appendix section of the paper)

  • The intuition behind the above equation construction
    • Ensure that the reward contribution on satisfaction of rule i should exceed the sum of the reward contributions by all rules with lower priority
    • This is achieved by multiplying the step function with a constant that grows exponentially with the priority of a rule
    • Average robustness to break the ties between the same-ranked trajectories

image

  • To understand the above equation more intuitively, look at the following plot.
    image

C. Receding horizon planning with rule hierarchies

Now that, we have a method to cast rule hierarchy into a nonlinear differentiable reward function.

A two-stage algorithm to solve (3).

  Stage 1: Planning with motion primitives Stage 2: Continuous trajectory optimization
Objective To generate a coarse initial trajectory for warm-starting the continuous optimizer in the second stage To refine the trajectory obtained from the first stage by solving
image
* You could further maximize the reward by still complying with rule hierarchy Ex: Safety improvement using STL robustness
Process * Choose a set M of open-loop controls that coarsely cover a variety of actions that the AV can take; e.g., (accelerate, turn right), (decelerate, keep straight), etc.
* Parallelized process
image
* Select the trajectory with the largest rule-hierarchy reward from sampled primitives
* Compute the gradients analytically using STLCG and refine the controls
image

Pseudo code

image

Signal Temporal Logic

Source - Autonomous Systems Laboratory
It is a formal language that converts specifications made in natural language to mathematical representation, that can be used in synthesis.
image

Term Explanation
Logic Made up of logical operations. For example AND, OR, NOT, etc.
Temporal These logical operations can be specified over time intervals
Signal We are reasoning about these logic and temporal operations over continuous real-valued time series such as a trajectory of the robot as it moves in time and space

Boolean semantics

How to read? Signal at time t satisfies the predicate $\mu$ if and only if predicate of state x at a time t is less than constant c.

Example: Maintain speed of 0.5 m/s between t+a and t+b

Robustness

A measure of how much a signal satisfies or violates an STL formula

Quantitative semantics

STLCG

Signal Temporal Logic Computational Graphs - It is a toolbox to compute robustness and their gradients using STL formulae. Repository here.