testing dataset

This commit is contained in:
2025-07-10 19:42:57 +08:00
commit 185959cf2a
316 changed files with 19605393 additions and 0 deletions

27
unit_test/README.md Normal file
View File

@@ -0,0 +1,27 @@
# Unit Test
##Description
This module contains unit test for EasyOCR.
## Usage
This module can be used as a typical python module. One python wrapper script and on ipython notebook are provided.
### Python script (*recommneded*)
The script can be called with (assuming calling from `EasyOCR/`);
```
python ./unit_test/run_unit_test.py --easyocr ./easyocr --verbose 2 --test ./unit_test/EasyOcrUnitTestPackage.pickle --data_dir ./examples
```
#### Script arguments
* easyocr: [Required] EasyOCR package to test. This should point to a directory where `__init__.py` of EasyOCR is located.
* verbose (-v): [Optional] Verbosity level to report test results (The default is 0)
* 0: Report only the final result
* 1: Same as 0 and also results of each tested module.
* 2: Same as 1 and also results of each test of each module.
* 3: Same as 2 and also the calculated and the expected outputs of each test.
* 4 or higher: Same as 3 and also the inputs of each test. (This will produce a lot of text on console).
* test_data (-t): [Optional] Path to test package to use (The default is `./unit_test/data/EasyOcrUnitTestPackage.pickle`).
* data_dir (-d): [Optional] Path to EasyOCR example images directory. (The default is `./examples/`
### Ipython notebook
Please see `demo.ipynb` for documentation.

Binary file not shown.

226
unit_test/demo.ipynb Normal file
View File

@@ -0,0 +1,226 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "8083da92",
"metadata": {
"ExecuteTime": {
"end_time": "2022-08-09T11:44:32.340662Z",
"start_time": "2022-08-09T11:44:31.757862Z"
}
},
"outputs": [],
"source": [
"import os\n",
"from unit_test import UnitTest"
]
},
{
"cell_type": "markdown",
"id": "1f664ba2",
"metadata": {},
"source": [
"### Set up paths "
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "ed49737e",
"metadata": {
"ExecuteTime": {
"end_time": "2022-08-09T11:44:33.073519Z",
"start_time": "2022-08-09T11:44:33.071997Z"
}
},
"outputs": [],
"source": [
"easyocr_module = \"../easyocr\"\n",
"verbose = 2\n",
"test_data = \"./data/EasyOcrUnitTestPackage.pickle\"\n",
"image_data_dir = \"../examples\""
]
},
{
"cell_type": "markdown",
"id": "99863c23",
"metadata": {},
"source": [
"### Initialize UnitTest"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "487955be",
"metadata": {
"ExecuteTime": {
"end_time": "2022-08-09T11:44:38.768726Z",
"start_time": "2022-08-09T11:44:34.017508Z"
},
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Unit test is set for EasyOCR at /home/rakpong/team/EasyOCR_private/easyocr\n"
]
}
],
"source": [
"unit_test = UnitTest(easyocr_module, \n",
" test_data,\n",
" image_data_dir\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "8a1da79b",
"metadata": {},
"source": [
"### Run the test with verbosity level 2"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "37da586b",
"metadata": {
"ExecuteTime": {
"end_time": "2022-08-09T11:44:47.011434Z",
"start_time": "2022-08-09T11:44:40.669523Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Testing EasyOCR: 9 modules will be tested.\n",
"\n",
"##Testing module model initialization: 4 tests will be performed.\n",
"#### test01: Counting parameters of detector module.\n",
"#### Passed. [1/4]\n",
"#### test02: Calculating total norm of parameters in detector module.\n",
"#### Passed. [2/4]\n",
"#### test03: Counting parameters of recognition module.\n",
"#### Passed. [3/4]\n",
"#### test04: Calculating total norm of parameters in recognition module.\n",
"#### Passed. [4/4]\n",
"##Module model initialization: Passed.\n",
"\n",
"##Testing module get_textbox function: 3 tests will be performed.\n",
"#### test01: Testing with default input.\n",
"#### Passed. [1/3]\n",
"#### test02: Testing with custom input.\n",
"#### Passed. [2/3]\n",
"#### test03: Testing with custom input.\n",
"#### Passed. [3/3]\n",
"##Module get_textbox function: Passed.\n",
"\n",
"##Testing module group_text_box function: 3 tests will be performed.\n",
"#### test01: Testing with default input.\n",
"#### Passed. [1/3]\n",
"#### test02: Testing with custom input.\n",
"#### Passed. [2/3]\n",
"#### test03: Testing with custom input.\n",
"#### Passed. [3/3]\n",
"##Module group_text_box function: Passed.\n",
"\n",
"##Testing module detect method: 3 tests will be performed.\n",
"#### test01: Testing with default input.\n",
"#### Passed. [1/3]\n",
"#### test02: Testing with custom input.\n",
"#### Passed. [2/3]\n",
"#### test03: Testing with custom input.\n",
"#### Passed. [3/3]\n",
"##Module detect method: Passed.\n",
"\n",
"##Testing module get_image_list function: 2 tests will be performed.\n",
"#### test01: Testing with default input.\n",
"#### Passed. [1/2]\n",
"#### test02: Testing with custom input.\n",
"#### Passed. [2/2]\n",
"##Module get_image_list function: Passed.\n",
"\n",
"##Testing module get_text_test function: 3 tests will be performed.\n",
"#### test01: Testing with default input.\n",
"#### Passed. [1/3]\n",
"#### test02: Testing with custom input.\n",
"#### Passed. [2/3]\n",
"#### test03: Testing with custom input.\n",
"#### Passed. [3/3]\n",
"##Module get_text_test function: Passed.\n",
"\n",
"##Testing module get_paragraph_test function: 3 tests will be performed.\n",
"#### test01: Testing with default input.\n",
"#### Passed. [1/3]\n",
"#### test02: Testing with custom input.\n",
"#### Passed. [2/3]\n",
"#### test03: Testing with custom input.\n",
"#### Passed. [3/3]\n",
"##Module get_paragraph_test function: Passed.\n",
"\n",
"##Testing module recognize method: 2 tests will be performed.\n",
"#### test01: Testing with default input.\n",
"#### Passed. [1/2]\n",
"#### test02: Testing with custom input.\n",
"#### Passed. [2/2]\n",
"##Module recognize method: Passed.\n",
"\n",
"##Testing module readtext method: 4 tests will be performed.\n",
"#### test01: Reading English text.\n",
"#### Passed. [1/4]\n",
"#### test02: Reading French text.\n",
"#### Passed. [2/4]\n",
"#### test03: Reading Chinese (simplified) text.\n",
"#### Passed. [3/4]\n",
"#### test04: Reading Korean text.\n",
"#### Passed. [4/4]\n",
"##Module readtext method: Passed.\n",
"\n",
"##################################################\n",
"Testing completed:\n",
" Final result: Passed.\n"
]
}
],
"source": [
"unit_test.do_test(verbose = 2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b62ccbd9",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

16
unit_test/demo.py Normal file
View File

@@ -0,0 +1,16 @@
import os
from unit_test import UnitTest
# %% Set up paths
easyocr_module = "../easyocr"
verbose = 2
test_data = "./data/EasyOcrUnitTestPackage.pickle"
image_data_dir = "../examples"
# %% Initialize UnitTest
unit_test = UnitTest(easyocr_module,
test_data,
image_data_dir
)
# %% Run UnitTest at verbosity level 2
unit_test.do_test(verbose = 2)

View File

@@ -0,0 +1,647 @@
import os
import argparse
import lzma
import pickle
from datetime import datetime
import numpy as np
import PIL.Image
import torch
import easyocr
# %%
def count_parameters(model):
return sum([param.numel() for param in model.parameters()])
def get_weight_norm(model):
with torch.no_grad():
return sum([param.norm() for param in model.parameters()]).cpu().item()
def replace(list_in, indices, values):
if not isinstance(indices, list):
indices = [indices]
if not isinstance(values, list):
values = [values]
assert len(indices) == len(values)
list_out = list_in.copy()
for index, value in zip(indices, values):
list_out[index] = value
return list_out
def get_easyocr(language):
if not isinstance(language, list):
language = [language]
return easyocr.Reader(language)
# %%
def main(args):
if args.output is None:
args.output = "EasyOcrUnitTestPackage_{}.pickle".format(datetime.now().strftime("%Y%m%dT%H%M"))
if args.data_dir is None:
data_dir = "./examples"
else:
data_dir = args.data_dir
image_preprocess = {
'english.png':{
"tiny": [540, 420, 690, 470],
"mini": [260, 90, 605, 160],
"small": [243, 234, 636, 360]
},
'french.jpg':{
"tiny": [184, 615, 425, 732]
},
'chinese.jpg':{
"tiny": [181, 78, 469, 157]
},
'korean.png':{
"tiny": [130, 84, 285, 180]
}
}
if any([file not in os.listdir(data_dir) for file in image_preprocess.keys()]):
raise FileNotFoundError("Cannot find {} in {}.").format(', '.join([file for file in image_preprocess.keys() if file not in os.listdir(data_dir)], data_dir))
easyocr_config = {"main_language": 'en'}
ocr = get_easyocr(easyocr_config["main_language"])
images = {os.path.splitext(file)[0]: {
key: np.asarray(PIL.Image.open(os.path.join(data_dir, file)).crop(crop_box))[:,:,::-1] for (key,crop_box) in page.items()
} for (file,page) in image_preprocess.items()}
english_mini_bgr, english_mini_gray = easyocr.utils.reformat_input(images['english']['mini'])
english_small_bgr, english_small_gray = easyocr.utils.reformat_input(images['english']['small'])
model_init_test = {'test01': {
'description': "Counting parameters of detector module.",
"method": "unit_test.count_parameters",
'input': ["unit_test.easyocr.ocr.detector"],
'output': count_parameters(ocr.detector),
'severity': "Error"
},
'test02': {
'description': "Calculating total norm of parameters in detector module.",
"method": "unit_test.get_weight_norm",
'input': ["unit_test.easyocr.ocr.detector"],
'output': get_weight_norm(ocr.detector),
'severity': "Warning"
},
'test03': {
'description': "Counting parameters of recognition module.",
"method": "unit_test.count_parameters",
'input': ["unit_test.easyocr.ocr.recognizer"],
'output': count_parameters(ocr.recognizer),
'severity': "Error"
},
'test04': {
'description': "Calculating total norm of parameters in recognition module.",
"method": "unit_test.get_weight_norm",
'input': ["unit_test.easyocr.ocr.recognizer"],
'output': get_weight_norm(ocr.recognizer),
'severity': "Warning"
},
}
get_textbox_test = {}
input0 = [ocr.detector,#detector
english_mini_bgr,#image
2560,#canvas_size
1.0,#mag_ratio
0.7,#text_threshold
0.4,#link_threshold
0.4,#low_text
False, #poly #Fixed
'cuda', #device #fixed ?
]
get_textbox_test.update({'test01': {
'description': "Testing with default input.",
"method": "unit_test.easyocr.detection.get_textbox",
'input': replace(input0,
[0, 1],
["unit_test.easyocr.ocr.detector",
"unit_test.inputs.images.english.mini_bgr"
]),
'output': easyocr.detection.get_textbox(*input0),
'severity': "Error"
}})
input0 = [ocr.detector,#detector
english_mini_bgr,#image
1280,#canvas_size
1.2,#mag_ratio
0.6,#text_threshold
0.3,#link_threshold
0.3,#low_text
False, #poly #Fixed
'cuda', #device #fixed ?
]
get_textbox_test.update({'test02': {
'description': "Testing with custom input.",
"method": "unit_test.easyocr.detection.get_textbox",
'input': replace(input0,
[0, 1],
["unit_test.easyocr.ocr.detector",
"unit_test.inputs.images.english.mini_bgr"
]),
'output': easyocr.detection.get_textbox(*input0),
'severity': "Error"
}})
input0 = [ocr.detector,#detector
english_mini_bgr,#image
640,#canvas_size
0.8,#mag_ratio
0.8,#text_threshold
0.5,#link_threshold
0.5,#low_text
False, #poly #Fixed
'cuda', #device #fixed ?
]
get_textbox_test.update({'test03': {
'description': "Testing with custom input.",
"method": "unit_test.easyocr.detection.get_textbox",
'input': replace(input0,
[0, 1],
["unit_test.easyocr.ocr.detector",
"unit_test.inputs.images.english.mini_bgr"
]),
'output': easyocr.detection.get_textbox(*input0),
'severity': "Error"
}})
input0 = [ocr.detector,#detector
english_mini_bgr,#image
2560,#canvas_size
1.0,#mag_ratio
0.7,#text_threshold
0.4,#link_threshold
0.4,#low_text
False, #poly #Fixed
'cuda', #device #fixed ?
]
output0 = easyocr.detection.get_textbox(*input0)
polys = output0[0]
group_text_box_test = {}
input_ = [polys,
0.1,# slope_ths
0.5,#ycenter_ths
0.5,#height_ths
1.0,#width_ths
0.05,#add_margin
True#sort_output
]
group_text_box_test.update({'test01': {
'description': "Testing with default input.",
"method": "unit_test.easyocr.utils.group_text_box",
'input': input_,
'output': easyocr.utils.group_text_box(*input_),
'severity': "Error"
}
})
input_ = [polys,
0.05,# slope_ths
0.3,#ycenter_ths
0.3,#height_ths
0.8,#width_ths
0.03,#add_margin
True#sort_output
]
group_text_box_test.update({'test02': {
'description': "Testing with custom input.",
"method": "unit_test.easyocr.utils.group_text_box",
'input': input_,
'output': easyocr.utils.group_text_box(*input_),
'severity': "Error"
}
})
input_ = [polys,
0.12,# slope_ths
0.7,#ycenter_ths
0.7,#height_ths
1.2,#width_ths
0.1,#add_margin
True#sort_output
]
group_text_box_test.update({'test03': {
'description': "Testing with custom input.",
"method": "unit_test.easyocr.utils.group_text_box",
'input': input_,
'output': easyocr.utils.group_text_box(*input_),
'severity': "Error"
}
})
input0 = [None,
20, #min_size
0.7, #text_threshold - fixed
0.4, #low_text - fixed
0.4, # link_threshold - fixed
2560, #canvas_size -fixed
1., #mag_ratio - fixed
0.1, #slope_ths - fixed
0.5, #ycenter_ths - fixed
0.5, #height_ths - fixed
0.5, #width_ths - fixed
0.1, #add_margin - fixed
True, #reformat - fixed
None #optimal_num_chars - fixed
]
detect_test = {}
input_ = replace(input0, [0,1], [english_mini_bgr, 20])
detect_test.update({'test01': {
'description': "Testing with default input.",
"method": "unit_test.easyocr.ocr.detect",
'input': replace(input_, 0, "unit_test.inputs.images.english.mini_bgr"),
'output': ocr.detect(*input_),
'severity': "Error"
},
})
input_ = replace(input0, [0,1], [english_small_bgr, 20])
detect_test.update({'test02': {
'description': "Testing with custom input.",
"method": "unit_test.easyocr.ocr.detect",
'input': replace(input_, 0, "unit_test.inputs.images.english.small_bgr"),
'output': ocr.detect(*input_),
'severity': "Error"
},
})
input_ = replace(input0, [0,1], [english_small_bgr, 100])
detect_test.update({'test03': {
'description': "Testing with custom input.",
"method": "unit_test.easyocr.ocr.detect",
'input': replace(input_, 0, "unit_test.inputs.images.english.small_bgr"),
'output': ocr.detect(*input_),
'severity': "Error"
},
})
get_image_list_test = {}
output0 = ocr.detect(english_small_bgr)
input0 = [output0[0][0],
output0[1][0],
english_small_gray,
64, #model_height
True# sort_output
]
input_ = replace(input0, 2, "unit_test.inputs.images.english.small_gray")
get_image_list_test.update({'test01': {
'description': "Testing with default input.",
"method": "unit_test.easyocr.utils.get_image_list",
'input': input_,
'output': easyocr.utils.get_image_list(*input0),
'severity': "Error"
},
})
output0 = ocr.detect(english_mini_bgr)
input0 = [output0[0][0],
output0[1][0],
english_mini_gray,
64, #model_height
True# sort_output
]
input_ = replace(input0, 2, "unit_test.inputs.images.english.mini_gray")
get_image_list_test.update({'test02': {
'description': "Testing with custom input.",
"method": "unit_test.easyocr.utils.get_image_list",
'input': input_,
'output': easyocr.utils.get_image_list(*input0),
'severity': "Error"
},
})
output0 = ocr.detect(english_mini_bgr)
input0 = [output0[0][0],
output0[1][0],
english_mini_gray,
64, #model_height
True# sort_output
]
image_list, max_width = easyocr.utils.get_image_list(*input0)
input0 = [ocr.character,
64, #imgH - fixed
int(max_width),
ocr.recognizer,
ocr.converter,
image_list[:2],
'', #ignore_char,
'greedy', #decoder,
5, #beamWidth,
1, #batch_size,
0.1, #contrast_ths,
0.5, #adjust_contrast,
0.003, #filter_ths,
1, #workers,
"cuda" #device
]
get_text_test = {}
output_ = easyocr.recognition.get_text(*input0)
input_ = replace(input0,
[0, 3, 4],
["unit_test.easyocr.ocr.character",
"unit_test.easyocr.ocr.recognizer",
"unit_test.easyocr.ocr.converter"]
)
get_text_test.update({'test01': {
'description': "Testing with default input.",
"method": "unit_test.easyocr.recognition.get_text",
'input': input_,
'output': output_,
'severity': "Error"
},
})
input0 = [ocr.character,
64, #imgH - fixed
int(max_width),
ocr.recognizer,
ocr.converter,
image_list[:2],
'', #ignore_char,
'greedy', #decoder,
4, #beamWidth,
1, #batch_size,
0.05, #contrast_ths,
0.3, #adjust_contrast,
0.001, #filter_ths,
1, #workers,
"cuda" #device
]
output_ = easyocr.recognition.get_text(*input0)
input_ = replace(input0,
[0, 3, 4],
["unit_test.easyocr.ocr.character",
"unit_test.easyocr.ocr.recognizer",
"unit_test.easyocr.ocr.converter"]
)
get_text_test.update({'test02': {
'description': "Testing with custom input.",
"method": "unit_test.easyocr.recognition.get_text",
'input': input_,
'output': output_,
'severity': "Error"
}})
input0 = [ocr.character,
64, #imgH - fixed
int(max_width),
ocr.recognizer,
ocr.converter,
image_list[:2],\
'', #ignore_char,
'greedy', #decoder,
6, #beamWidth,
4, #batch_size,
0.2, #contrast_ths,
0.6, #adjust_contrast,
0.005, #filter_ths,
1, #workers,
"cuda" #device
]
output_ = easyocr.recognition.get_text(*input0)
input_ = replace(input0,
[0, 3, 4],
["unit_test.easyocr.ocr.character",
"unit_test.easyocr.ocr.recognizer",
"unit_test.easyocr.ocr.converter"]
)
get_text_test.update({'test03': {
'description': "Testing with custom input.",
"method": "unit_test.easyocr.recognition.get_text",
'input': input_,
'output': output_,
'severity': "Error"
}})
get_paragraph_test = {}
output0 = ocr.detect(english_mini_bgr)
input0 = [output0[0][0],
output0[1][0],
english_mini_gray,
64, #model_height
True# sort_output
]
image_list, max_width = easyocr.utils.get_image_list(*input0)
input0 = [ocr.character,
64, #imgH - fixed
int(max_width),
ocr.recognizer,
ocr.converter,
image_list[:2],
'', #ignore_char,
'greedy', #decoder,
5, #beamWidth,
1, #batch_size,
0.1, #contrast_ths,
0.5, #adjust_contrast,
0.003, #filter_ths,
1, #workers,
"cuda" #device
]
output0 = easyocr.recognition.get_text(*input0)
input_ = [output0,
1, #x_ths
0.5, #y_ths
'ltr' #mode
]
get_paragraph_test.update({'test01': {
'description': "Testing with default input.",
"method": "unit_test.easyocr.utils.get_paragraph",
'input': input_,
'output': easyocr.utils.get_paragraph(*input_),
'severity': "Error"
}})
input_ = [output0,
0.5, #x_ths
0.3, #y_ths
'ltr' #mode
]
get_paragraph_test.update({'test02': {
'description': "Testing with custom input.",
"method": "unit_test.easyocr.utils.get_paragraph",
'input': input_,
'output': easyocr.utils.get_paragraph(*input_),
'severity': "Error"
}})
input_ = [output0,
1.5, #x_ths
1, #y_ths
'ltr' #mode
]
get_paragraph_test.update({'test03': {
'description': "Testing with custom input.",
"method": "unit_test.easyocr.utils.get_paragraph",
'input': input_,
'output': easyocr.utils.get_paragraph(*input_),
'severity': "Error"
}})
input_recog = [None,
None, #horizontal_list
None, #free_list
'greedy', #decoder
5, #beamWidth
1,#batch_size
0, #workers
None, #allowlist
None, #blocklist
1, #detail
None, #rotation_info
False,#paragraph
0.1,#contrast_ths
0.5, #adjust_contrast
0.003, #filter_ths
0.5, #y_ths
1.0, #x_ths
True, #reformat
'standard'#output_format
]
recognize_test = {}
h_list, f_list = ocr.detect(english_mini_bgr)
input_ = replace(input_recog,
[0, 1, 2],
[english_mini_gray, h_list[0], f_list[0]])
recognize_test.update({'test01': {
'description': "Testing with default input.",
"method": "unit_test.easyocr.ocr.recognize",
'input': replace(input_, 0, "unit_test.inputs.images.english.mini_gray"),
'output': ocr.recognize(*input_),
'severity': "Error"
}})
h_list, f_list = ocr.detect(english_small_bgr)
input_ = replace(input_recog,
[0, 1, 2],
[english_small_gray, h_list[0], f_list[0]])
recognize_test.update({'test02': {
'description': "Testing with custom input.",
"method": "unit_test.easyocr.ocr.recognize",
'input': replace(input_, 0, "unit_test.inputs.images.english.small_gray"),
'output': ocr.recognize(*input_),
'severity': "Error"
}})
readtext_test = {}
#english_tiny_bgr, _ = easyocr.utils.reformat_input(images['english']['tiny'])
input_ = ["unit_test.inputs.images.english.tiny", 'en']
ocr = get_easyocr('en')
_, pred, confidence = ocr.readtext(images['english']['tiny'])[0]
output_ = [pred, confidence]
readtext_test.update({'test01': {
'description': "Reading English text.",
"method": "unit_test.easyocr_read_as",
'input': input_,
'output': output_,
'severity': "Error"
}})
#french_tiny_bgr, _ = easyocr.utils.reformat_input(images['french']['tiny'])
input_ = ["unit_test.inputs.images.french.tiny", 'fr']
ocr = get_easyocr('fr')
_, pred, confidence = ocr.readtext(images['french']['tiny'])[0]
output_ = [pred, confidence]
readtext_test.update({'test02': {
'description': "Reading French text.",
"method": "unit_test.easyocr_read_as",
'input': input_,
'output': output_,
'severity': "Error"
}})
#chinese_tiny_bgr, _ = easyocr.utils.reformat_input(images['chinese']['tiny'])
input_ = ["unit_test.inputs.images.chinese.tiny", 'ch_sim']
ocr = get_easyocr('ch_sim')
_, pred, confidence = ocr.readtext(images['chinese']['tiny'])[0]
output_ = [pred, confidence]
readtext_test.update({'test03': {
'description': "Reading Chinese (simplified) text.",
"method": "unit_test.easyocr_read_as",
'input': input_,
'output': output_,
'severity': "Error"
}})
#korean_tiny_bgr, _ = easyocr.utils.reformat_input(images['korean']['tiny'])
input_ = ["unit_test.inputs.images.korean.tiny", 'ko']
ocr = get_easyocr('ko')
_, pred, confidence = ocr.readtext(images['korean']['tiny'])[0]
output_ = [pred, confidence]
readtext_test.update({'test04': {
'description': "Reading Korean text.",
"method": "unit_test.easyocr_read_as",
'input': input_,
'output': output_,
'severity': "Error"
}})
solution_book = {
'inputs':{'images': image_preprocess,
'easyocr_config': easyocr_config
},
'tests':{
"model initialization": model_init_test,
"get_textbox function": get_textbox_test,
"group_text_box function": group_text_box_test,
"detect method": detect_test,
"get_image_list function": get_image_list_test,
"get_text_test function": get_text_test,
"get_paragraph_test function": get_paragraph_test,
"recognize method": recognize_test,
"readtext method": readtext_test,
}
}
with lzma.open(args.output, 'wb') as fid:
pickle.dump(solution_book, fid)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Script to pack EasyOCR weight.",
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument("-o", "--output", default=None, help="output path.")
parser.add_argument("-d", "--data_dir", default=None, help="data directory")
args = parser.parse_args()
main(args)

View File

@@ -0,0 +1,19 @@
import argparse
from unit_test import UnitTest
# %%
def main(args):
unit_test = UnitTest(args.easyocr, args.test_data, args.image_data_dir, args.verbose)
unit_test.do_test(args.verbose)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Script to run EasyOCR unit tet.",
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument("--easyocr", help="Directory of EasyOCR to test.")
parser.add_argument("-t", "--test_data", default="./data/EasyOcrUnitTestPackage.pickle", help="Path to test data.")
parser.add_argument("-d", "--image_data_dir", default="../examples", help="Path to directory that contains EasyOCR example images.")
parser.add_argument("-v", "--verbose", default=0, type = int, help="Verbosity level of report.")
args = parser.parse_args()
main(args)

262
unit_test/unit_test.py Normal file
View File

@@ -0,0 +1,262 @@
import os
import sys
import importlib
import pickle
import lzma
import PIL.Image
import numpy as np
import torch
# %%
class Attributes:
pass
class UnitTest:
def __init__(self,
easyocr_module,
test_data = "./data/EasyOcrUnitTestPackage.pickle",
image_data_dir = "../examples",
verbose = 0,
numeric_acceptance_error = 0.1):
self.verbose = verbose
easy_ocr_init = os.path.join(easyocr_module, "__init__.py")
if not os.path.isfile(easy_ocr_init):
raise FileNotFoundError("Invalid easyocr_module. The directory should contain __init__.py.")
spec = importlib.util.spec_from_file_location("easyocr", easy_ocr_init)
easyocr = importlib.util.module_from_spec(spec)
sys.modules["easyocr"] = easyocr
spec.loader.exec_module(easyocr)
self.easyocr = easyocr
if not hasattr(self.easyocr, 'utils'):
setattr(self.easyocr, 'utils', importlib.import_module('easyocr.utils'))
if not hasattr(self.easyocr, 'detection'):
setattr(self.easyocr, 'detection', importlib.import_module('easyocr.detection'))
if not hasattr(self.easyocr, 'recognition'):
setattr(self.easyocr, 'recognition', importlib.import_module('easyocr.recognition'))
self.easyocr_dir = os.path.dirname(easyocr.__file__)
print("Unit test is set for EasyOCR at {}".format(os.path.abspath(self.easyocr_dir)))
self.image_data_dir = image_data_dir
self.set_data(test_data)
self.set_easyocr()
self.numeric_acceptance_error = numeric_acceptance_error
def set_data(self, test_data):
self.inputs = Attributes()
with lzma.open(test_data, 'rb') as fid:
solution_book = pickle.load(fid)
self.test_book = solution_book['tests']
if any([file not in os.listdir(self.image_data_dir) for file in solution_book['inputs']['images'].keys()]):
raise FileNotFoundError("Cannot find {} in {}.").format(', '.join([file for file in solution_book['inputs']['images'].keys()
if file not in os.listdir(self.image_data_dir)], self.image_data_dir))
images = {os.path.splitext(file)[0]: {
key: np.asarray(PIL.Image.open(os.path.join(self.image_data_dir, file)).crop(crop_box))[:,:,::-1] for (key,crop_box) in page.items()
} for (file,page) in solution_book['inputs']['images'].items()}
english_mini_bgr, english_mini_gray = self.easyocr.utils.reformat_input(images['english']['mini'])
english_small_bgr, english_small_gray = self.easyocr.utils.reformat_input(images['english']['small'])
images['english'].update({'mini_bgr': english_mini_bgr,
'mini_gray': english_mini_gray,
'small_bgr': english_small_bgr,
'small_gray': english_small_gray,
})
setattr(self.inputs, 'images', self.dict2attr(images))
setattr(self.inputs, 'easyocr_config', self.dict2attr(solution_book['inputs']['easyocr_config']))
def dict2attr(self, dict_):
attr = Attributes()
[setattr(attr, key, self.dict2attr(value)) if isinstance(value, dict) else setattr(attr, key, value) for (key,value) in dict_.items()]
return attr
def count_parameters(self, model):
return sum([param.numel() for param in model.parameters()])
def get_weight_norm(self, model):
with torch.no_grad():
return sum([param.norm() for param in model.parameters()]).cpu().item()
def get_nested_attr(self, parent, attr):
if len(attr.split(".")) == 1:
return getattr(parent, attr)
else:
attrs = attr.split(".")
parent = getattr(parent, attrs[0])
attr = ".".join(attrs[1:])
attr = self.get_nested_attr(parent, attr)
return attr
def easyocr_read_as(self, image, language):
if not isinstance(language, list):
language = [language]
reader = self.easyocr.Reader(language)
_, pred, confidence = reader.readtext(image)[0]
reader = None
torch.cuda.empty_cache()
return pred, confidence
def set_easyocr(self):
ocr = self.easyocr.Reader([self.inputs.easyocr_config.main_language])
setattr(self.easyocr, 'ocr', ocr)
def validate(self, test, solution, dtype):
if dtype == str:
return test == solution
elif np.issubdtype(dtype, np.integer):
return abs(1-test/solution) < self.numeric_acceptance_error
elif np.issubdtype(dtype, np.inexact):
return abs(1-test/solution) < self.numeric_acceptance_error
elif dtype == dict:
return self.are_dicts_equal(test, solution)
elif dtype == list or dtype == tuple:
return self.are_lists_equal(test, solution)
elif dtype == np.ndarray:
return (abs(1-test/solution) < self.numeric_acceptance_error).all()
elif dtype == torch.Tensor:
return (abs(1-test/solution) < self.numeric_acceptance_error).all()
else:
raise TypeError("Unsupport data type ({}) to validate. Supporting types are str, int, float, dict, list, np.ndarray, or torch.Tensor".format(dtype))
def are_dicts_equal(self, test, solution):
if test.keys() == solution.keys():
return all([self.validate(test[key], solution[key], type(solution[key])) for key in solution.keys()])
else:
return False
def are_lists_equal(self, test, solution):
if len(test) == len(solution):
return all([self.validate(tt, ss, type(ss)) for (tt,ss) in zip(test, solution)])
else:
return False
def is_list_or_tuple(self, test):
return isinstance(test, list) or isinstance(test, tuple)
#Should check length of results/solutions/dtypes
def validate_all(self, results, solutions, dtypes):
if not isinstance(results, list):
results = [results]
if not isinstance(solutions, list):
solutions = [solutions]
if not isinstance(dtypes, list):
dtypes = [dtypes]
validation = []
for (result, solution, dtype) in zip(results, solutions, dtypes):
if (not self.is_list_or_tuple(result)
and not self.is_list_or_tuple(result)
and not self.is_list_or_tuple(result)
):
validation.append(self.validate(result, solution, type(solution)))
elif(self.is_list_or_tuple(result)
and self.is_list_or_tuple(result)
and self.is_list_or_tuple(result)
):
validation.append(self.validate_all(results, solutions, type(solution)))
else:
raise
return all(validation)
def do_test(self, verbose = None):
if verbose is not None:
self.verbose = verbose
num_module_to_test = len(self.test_book)
num_module_pass = 0
print("Testing EasyOCR: {:d} modules will be tested.\n".format(num_module_to_test))
for name,tests in self.test_book.items():
num_test = len(tests)
num_passed = 0
min_pass = sum([test['severity'] == 'Error' for test in tests.values()])
if self.verbose > 0:
print("##Testing module {}: {:d} tests will be performed.".format(name, num_test))
for test_id, test in tests.items():
if self.verbose > 1:
print("#### {}: {}".format(test_id, test['description']))
if test['method'].startswith('unit_test.'):
test['method'] = '.'.join(test['method'].split('.')[1:])
test_method = self.get_nested_attr(self, test['method'])
test['input'] = [(self.get_nested_attr(self, '.'.join(input_.split('.')[1:]))
if input_.startswith('unit_test.') else input_) if isinstance(input_, str) else input_ for input_ in test['input']]
if verbose > 3:
print("###### Input: {}".format(test['input']))
results = test_method(*test['input'])
if verbose > 2:
print("###### Expected output: {}".format(test['output']))
print("###### Received output: {}".format(results))
test_result = self.validate(results, test['output'], type(test['output']))
if test_result:
num_passed += 1
if self.verbose > 1:
print("#### Passed. [{:d}/{:d}]".format(num_passed, num_test))
else:
if test['severity'] == "Warning":
num_passed += 1
if self.verbose > 1:
print("#### Passed. [{:d}/{:d}]".format(num_passed, num_test))
if self.verbose > 2:
print("##### Warning: While the result is considered as passed, the test yields results ({}) \
that are different from the expected values ({}). It is strongly recommended to make sure \
that this is expected.".format(results, test['output']))
else:
if self.verbose > 1:
print("#### Failed")
if self.verbose > 2:
print("##### The test yields results ({}) which are different from the expected values ({}).")
if num_passed >= min_pass:
num_module_pass += 1
if self.verbose > 0:
print("##Module {}: Passed.\n".format(name))
else:
print("##Module {}: Failed.\n".format(name))
print("#"*50)
if num_module_pass >= num_module_to_test:
print("Testing completed:\n Final result: Passed.")
else:
print("Testing completed:\n Final result: Failed.")