Precise parking requires an end-to-end system where perception adaptively provides policy-relevant details — especially in critical areas where fine control decisions are essential. End-to-end learning offers a unified framework by directly mapping sensor inputs to control actions, but existing approaches lack effective synergy between perception and control.
We propose CAA-Policy, an end-to-end imitation learning system that allows control signals to guide the learning of visual attention via a novel Control-Aided Attention (CAA) mechanism. We train such an attention module in a self-supervised manner, using backpropagated gradients from the control outputs instead of from the training loss. This strategy encourages attention to focus on visual features that induce high variance in action outputs, rather than merely minimizing the training loss — a shift we demonstrate leads to a more robust and generalizable policy.
CAA-Policy further incorporates:
- Short-horizon waypoint prediction as an auxiliary task for improved temporal consistency
- Learnable motion prediction to robustly track target slots over time
- Modified target tokenization for more effective feature fusion
Extensive experiments in the CARLA simulator show that CAA-Policy consistently surpasses both the end-to-end learning baseline and the modular BEV segmentation + hybrid A* pipeline, achieving superior accuracy, robustness, and interpretability.
📦 Code and collected training datasets will be released.
| Date | Update |
|---|---|
| 2025/11/11 | Initial code release |
git clone https://github.com/ai4ce/CAAPolicy.git
cd CAAPolicy/conda env create -f environment.yml
conda activate E2EParkingCUDA compatibility: CUDA 11.7 is used by default. CUDA 10.2 and 11.3 have also been validated.
chmod +x setup_carla.sh
./setup_carla.sh # Downloads and sets up CARLA in a local ./carla/ folder./carla/CarlaUE4.sh -opengl -carla-port=4000Run inference with a pre-trained checkpoint:
python carla_parking_eva.py \
--model_path ../all_trained_ckpt/All_best/last.ckpt \
--model_path_dynamic ../all_trained_ckpt/foundation/foundation.ckpt \
--model_seq_path_dynamic ../all_trained_ckpt/foundation/foundation_seq.ckpt \
-p 4000| Argument | Description | Default |
|---|---|---|
--model_path |
Path to the main model checkpoint | Required |
--model_path_dynamic |
Path to the dynamic motion model checkpoint | Required |
--model_seq_path_dynamic |
Path to the sequential dynamic model checkpoint | Required |
--eva_epochs |
Number of evaluation epochs | 4 |
--eva_task_nums |
Number of evaluation tasks | 16 |
--eva_parking_nums |
Number of parking attempts per slot | 6 |
--eva_result_path |
Path to save evaluation results (CSV) | Required |
--shuffle_veh |
Shuffle static vehicles between tasks | True |
--shuffle_weather |
Shuffle weather between tasks | False |
--random_seed |
Random seed for environment initialization | 0 |
Evaluation metrics are saved as CSV files at the path specified by --eva_result_path.
For dataset generation, please refer the below and also check my dataset gen branch. Or you can check the E2EParking paper for more details.
Key checkpoints are included in the ckpt/ folder. Additional pre-trained checkpoints are available for download:
| Checkpoint | Description | Link |
|---|---|---|
| All_best | Best overall model | Google Drive |
Note: Checkpoint files are tracked via Git LFS. Run
git lfs pullafter cloning if the files appear as pointers.
With the CARLA server running, generate training data in a separate terminal:
git checkout new_data_generate
python3 carla_data_gen.py| Argument | Description | Default |
|---|---|---|
--save_path |
Path to save sensor data | ./e2e_parking/ |
--task_num |
Number of parking tasks | 16 |
--shuffle_veh |
Shuffle static vehicles between tasks | True |
--shuffle_weather |
Shuffle weather between tasks | False |
--random_seed |
Random seed (0 = use current timestamp) | 0 |
| Key | Action |
|---|---|
W |
Throttle |
A / D |
Steer left / right |
S |
Hand brake |
Space |
Brake |
Q |
Reverse gear |
Backspace |
Reset current task |
TAB |
Switch camera view |
A parking attempt is considered successful when all three conditions are maintained for 60 consecutive frames:
- Position error (vehicle center → slot center) < 0.5 m
- Orientation error < 0.5°
- No collision
The target parking slot is marked with a red T. Tasks switch automatically upon success; collisions reset the current task.
python pl_train.pyConfigure training parameters (data path, epochs, checkpoint path, etc.) in training.yaml.
In pl_train.py, update the following:
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1,2,3,4,5,6,7'
num_gpus = 8If you find this work useful, please consider citing:
@article{caapolicy2025,
title={End-to-End Visual Autonomous Parking via Control-Aided Attention},
author={},
journal={arXiv preprint arXiv:2509.11090},
year={2025}
}We thank the authors of the following related works: