Files
easyocr/trainer/craft/README.md

106 lines
4.3 KiB
Markdown
Raw Permalink Normal View History

2025-07-10 19:42:57 +08:00
# CRAFT-train
On the official CRAFT github, there are many people who want to train CRAFT models.
However, the training code is not published in the official CRAFT repository.
There are other reproduced codes, but there is a gap between their performance and performance reported in the original paper. (https://arxiv.org/pdf/1904.01941.pdf)
The trained model with this code recorded a level of performance similar to that of the original paper.
```bash
├── config
│ ├── syn_train.yaml
│ └── custom_data_train.yaml
├── data
│ ├── pseudo_label
│ │ ├── make_charbox.py
│ │ └── watershed.py
│ ├── boxEnlarge.py
│ ├── dataset.py
│ ├── gaussian.py
│ ├── imgaug.py
│ └── imgproc.py
├── loss
│ └── mseloss.py
├── metrics
│ └── eval_det_iou.py
├── model
│ ├── craft.py
│ └── vgg16_bn.py
├── utils
│ ├── craft_utils.py
│ ├── inference_boxes.py
│ └── utils.py
├── trainSynth.py
├── train.py
├── train_distributed.py
├── eval.py
├── data_root_dir (place dataset folder here)
└── exp (model and experiment result files will saved here)
```
### Installation
Install using `pip`
``` bash
pip install -r requirements.txt
```
### Training
1. Put your training, test data in the following format
```
└── data_root_dir (you can change root dir in yaml file)
├── ch4_training_images
│ ├── img_1.jpg
│ └── img_2.jpg
├── ch4_training_localization_transcription_gt
│ ├── gt_img_1.txt
│ └── gt_img_2.txt
├── ch4_test_images
│ ├── img_1.jpg
│ └── img_2.jpg
└── ch4_training_localization_transcription_gt
├── gt_img_1.txt
└── gt_img_2.txt
```
* localization_transcription_gt files format :
```
377,117,463,117,465,130,378,130,Genaxis Theatre
493,115,519,115,519,131,493,131,[06]
374,155,409,155,409,170,374,170,###
```
2. Write configuration in yaml format (example config files are provided in `config` folder.)
* To speed up training time with multi-gpu, set num_worker > 0
3. Put the yaml file in the config folder
4. Run training script like below (If you have multi-gpu, run train_distributed.py)
5. Then, experiment results will be saved to ```./exp/[yaml]``` by default.
* Step 1 : To train CRAFT with SynthText dataset from scratch
* Note : This step is not necessary if you use <a href="https://drive.google.com/file/d/1enVIsgNvBf3YiRsVkxodspOn55PIK-LJ/view?usp=sharing">this pretrain</a> as a checkpoint when start training step 2. You can download and put it in `exp/CRAFT_clr_amp_29500.pth` and change `ckpt_path` in the config file according to your local setup.
```
CUDA_VISIBLE_DEVICES=0 python3 trainSynth.py --yaml=syn_train
```
* Step 2 : To train CRAFT with [SynthText + IC15] or custom dataset
```
CUDA_VISIBLE_DEVICES=0 python3 train.py --yaml=custom_data_train ## if you run on single GPU
CUDA_VISIBLE_DEVICES=0,1 python3 train_distributed.py --yaml=custom_data_train ## if you run on multi GPU
```
### Arguments
* ```--yaml``` : configuration file name
### Evaluation
* In the official repository issues, the author mentioned that the first row setting F1-score is around 0.75.
* In the official paper, it is stated that the result F1-score of the second row setting is 0.87.
* If you adjust post-process parameter 'text_threshold' from 0.85 to 0.75, then F1-score reaches to 0.856.
* It took 14h to train weak-supervision 25k iteration with 8 RTX 3090 Ti.
* Half of GPU assigned for training, and half of GPU assigned for supervision setting.
| Training Dataset | Evaluation Dataset | Precision | Recall | F1-score | pretrained model |
| ------------- |-----|:-----:|:-----:|:-----:|-----:|
| SynthText | ICDAR2013 | 0.801 | 0.748 | 0.773| <a href="https://drive.google.com/file/d/1enVIsgNvBf3YiRsVkxodspOn55PIK-LJ/view?usp=sharing">download link</a>|
| SynthText + ICDAR2015 | ICDAR2015 | 0.909 | 0.794 | 0.848| <a href="https://drive.google.com/file/d/1qUeZIDSFCOuGS9yo8o0fi-zYHLEW6lBP/view">download link</a>|