testing dataset

2025-07-10 19:42:57 +08:00
commit 185959cf2a
316 changed files with 19605393 additions and 0 deletions
--- a/unit_test/README.md
+++ b/unit_test/README.md
@@ -0,0 +1,27 @@
+# Unit Test
+
+##Description
+This module contains unit test for EasyOCR.
+
+## Usage
+This module can be used as a typical python module. One python wrapper script and on ipython notebook are provided.
+
+### Python script (*recommneded*)
+The script can be called with (assuming calling from `EasyOCR/`);
+```
+python ./unit_test/run_unit_test.py --easyocr ./easyocr --verbose 2 --test ./unit_test/EasyOcrUnitTestPackage.pickle --data_dir ./examples 
+```
+
+#### Script arguments
+ * easyocr: [Required] EasyOCR package to test. This should point to a directory where `__init__.py` of EasyOCR is located.
+ * verbose (-v): [Optional] Verbosity level to report test results (The default is 0)
+    * 0: Report only the final result
+    * 1: Same as 0 and also results of each tested module.
+    * 2: Same as 1 and also results of each test of each module.
+    * 3: Same as 2 and also the calculated and the expected outputs of each test.
+    * 4 or higher: Same as 3 and also the inputs of each test. (This will produce a lot of text on console).
+ * test_data (-t): [Optional] Path to test package to use (The default is `./unit_test/data/EasyOcrUnitTestPackage.pickle`).
+ * data_dir (-d): [Optional] Path to EasyOCR example images directory. (The default is `./examples/`
+ 
+### Ipython notebook
+Please see `demo.ipynb` for documentation.
--- a/unit_test/data/EasyOcrUnitTestPackage.pickle
+++ b/unit_test/data/EasyOcrUnitTestPackage.pickle
--- a/unit_test/demo.ipynb
+++ b/unit_test/demo.ipynb
@@ -0,0 +1,226 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "8083da92",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-08-09T11:44:32.340662Z",
+     "start_time": "2022-08-09T11:44:31.757862Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from unit_test import UnitTest"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1f664ba2",
+   "metadata": {},
+   "source": [
+    "### Set up paths "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "ed49737e",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-08-09T11:44:33.073519Z",
+     "start_time": "2022-08-09T11:44:33.071997Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "easyocr_module = \"../easyocr\"\n",
+    "verbose = 2\n",
+    "test_data = \"./data/EasyOcrUnitTestPackage.pickle\"\n",
+    "image_data_dir = \"../examples\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "99863c23",
+   "metadata": {},
+   "source": [
+    "### Initialize UnitTest"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "487955be",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-08-09T11:44:38.768726Z",
+     "start_time": "2022-08-09T11:44:34.017508Z"
+    },
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Unit test is set for EasyOCR at /home/rakpong/team/EasyOCR_private/easyocr\n"
+     ]
+    }
+   ],
+   "source": [
+    "unit_test = UnitTest(easyocr_module, \n",
+    "                     test_data,\n",
+    "                     image_data_dir\n",
+    "                     )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a1da79b",
+   "metadata": {},
+   "source": [
+    "### Run the test with verbosity level 2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "37da586b",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-08-09T11:44:47.011434Z",
+     "start_time": "2022-08-09T11:44:40.669523Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Testing EasyOCR: 9 modules will be tested.\n",
+      "\n",
+      "##Testing module model initialization: 4 tests will be performed.\n",
+      "#### test01: Counting parameters of detector module.\n",
+      "#### Passed. [1/4]\n",
+      "#### test02: Calculating total norm of parameters in detector module.\n",
+      "#### Passed. [2/4]\n",
+      "#### test03: Counting parameters of recognition module.\n",
+      "#### Passed. [3/4]\n",
+      "#### test04: Calculating total norm of parameters in recognition module.\n",
+      "#### Passed. [4/4]\n",
+      "##Module model initialization: Passed.\n",
+      "\n",
+      "##Testing module get_textbox function: 3 tests will be performed.\n",
+      "#### test01: Testing with default input.\n",
+      "#### Passed. [1/3]\n",
+      "#### test02: Testing with custom input.\n",
+      "#### Passed. [2/3]\n",
+      "#### test03: Testing with custom input.\n",
+      "#### Passed. [3/3]\n",
+      "##Module get_textbox function: Passed.\n",
+      "\n",
+      "##Testing module group_text_box function: 3 tests will be performed.\n",
+      "#### test01: Testing with default input.\n",
+      "#### Passed. [1/3]\n",
+      "#### test02: Testing with custom input.\n",
+      "#### Passed. [2/3]\n",
+      "#### test03: Testing with custom input.\n",
+      "#### Passed. [3/3]\n",
+      "##Module group_text_box function: Passed.\n",
+      "\n",
+      "##Testing module detect method: 3 tests will be performed.\n",
+      "#### test01: Testing with default input.\n",
+      "#### Passed. [1/3]\n",
+      "#### test02: Testing with custom input.\n",
+      "#### Passed. [2/3]\n",
+      "#### test03: Testing with custom input.\n",
+      "#### Passed. [3/3]\n",
+      "##Module detect method: Passed.\n",
+      "\n",
+      "##Testing module get_image_list function: 2 tests will be performed.\n",
+      "#### test01: Testing with default input.\n",
+      "#### Passed. [1/2]\n",
+      "#### test02: Testing with custom input.\n",
+      "#### Passed. [2/2]\n",
+      "##Module get_image_list function: Passed.\n",
+      "\n",
+      "##Testing module get_text_test function: 3 tests will be performed.\n",
+      "#### test01: Testing with default input.\n",
+      "#### Passed. [1/3]\n",
+      "#### test02: Testing with custom input.\n",
+      "#### Passed. [2/3]\n",
+      "#### test03: Testing with custom input.\n",
+      "#### Passed. [3/3]\n",
+      "##Module get_text_test function: Passed.\n",
+      "\n",
+      "##Testing module get_paragraph_test function: 3 tests will be performed.\n",
+      "#### test01: Testing with default input.\n",
+      "#### Passed. [1/3]\n",
+      "#### test02: Testing with custom input.\n",
+      "#### Passed. [2/3]\n",
+      "#### test03: Testing with custom input.\n",
+      "#### Passed. [3/3]\n",
+      "##Module get_paragraph_test function: Passed.\n",
+      "\n",
+      "##Testing module recognize method: 2 tests will be performed.\n",
+      "#### test01: Testing with default input.\n",
+      "#### Passed. [1/2]\n",
+      "#### test02: Testing with custom input.\n",
+      "#### Passed. [2/2]\n",
+      "##Module recognize method: Passed.\n",
+      "\n",
+      "##Testing module readtext method: 4 tests will be performed.\n",
+      "#### test01: Reading English text.\n",
+      "#### Passed. [1/4]\n",
+      "#### test02: Reading French text.\n",
+      "#### Passed. [2/4]\n",
+      "#### test03: Reading Chinese (simplified) text.\n",
+      "#### Passed. [3/4]\n",
+      "#### test04: Reading Korean text.\n",
+      "#### Passed. [4/4]\n",
+      "##Module readtext method: Passed.\n",
+      "\n",
+      "##################################################\n",
+      "Testing completed:\n",
+      " Final result: Passed.\n"
+     ]
+    }
+   ],
+   "source": [
+    "unit_test.do_test(verbose = 2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b62ccbd9",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/unit_test/demo.py
+++ b/unit_test/demo.py
@@ -0,0 +1,16 @@
+import os
+from unit_test import UnitTest
+
+# %% Set up paths 
+easyocr_module = "../easyocr"
+verbose = 2
+test_data = "./data/EasyOcrUnitTestPackage.pickle"
+image_data_dir = "../examples"
+
+# %% Initialize UnitTest
+unit_test = UnitTest(easyocr_module, 
+                     test_data,
+                     image_data_dir
+                     )
+# %% Run UnitTest at verbosity level 2
+unit_test.do_test(verbose = 2)
--- a/unit_test/make_test_solution.py
+++ b/unit_test/make_test_solution.py
@@ -0,0 +1,647 @@
+import os
+import argparse
+import lzma
+import pickle
+from datetime import datetime
+import numpy as np
+import PIL.Image
+
+import torch
+
+import easyocr
+
+# %%
+def count_parameters(model):
+    return sum([param.numel() for param in model.parameters()])
+
+def get_weight_norm(model):
+    with torch.no_grad():
+        return sum([param.norm() for param in model.parameters()]).cpu().item()
+    
+def replace(list_in, indices, values):
+    if not isinstance(indices, list):
+        indices = [indices]
+    if not isinstance(values, list):
+        values = [values]
+    assert len(indices) == len(values)
+    
+    list_out = list_in.copy()
+    for index, value in zip(indices, values):
+        list_out[index] = value
+
+    return list_out
+
+def get_easyocr(language):
+    if not isinstance(language, list):
+        language = [language]
+    return easyocr.Reader(language)
+
+# %%
+def main(args):
+    
+    if args.output is None:
+        args.output = "EasyOcrUnitTestPackage_{}.pickle".format(datetime.now().strftime("%Y%m%dT%H%M"))
+    
+    if args.data_dir is None:
+        data_dir = "./examples"
+    else:
+        data_dir = args.data_dir
+    
+    image_preprocess = {
+        'english.png':{
+            "tiny": [540, 420, 690, 470],
+            "mini": [260, 90, 605, 160],
+            "small": [243, 234, 636, 360]
+            }, 
+        'french.jpg':{
+            "tiny": [184, 615, 425, 732]
+            }, 
+        'chinese.jpg':{
+            "tiny": [181, 78, 469, 157]
+            }, 
+        'korean.png':{
+            "tiny": [130, 84, 285, 180]
+            }
+        }
+
+    
+    if any([file not in os.listdir(data_dir) for file in image_preprocess.keys()]):
+        raise FileNotFoundError("Cannot find {} in {}.").format(', '.join([file for file in image_preprocess.keys() if file not in os.listdir(data_dir)], data_dir))
+    
+    easyocr_config = {"main_language": 'en'}
+    
+    ocr = get_easyocr(easyocr_config["main_language"])
+    
+    images = {os.path.splitext(file)[0]: {
+                    key: np.asarray(PIL.Image.open(os.path.join(data_dir, file)).crop(crop_box))[:,:,::-1] for (key,crop_box) in page.items() 
+                    } for (file,page) in image_preprocess.items()}
+
+    
+    english_mini_bgr, english_mini_gray = easyocr.utils.reformat_input(images['english']['mini'])
+    english_small_bgr, english_small_gray = easyocr.utils.reformat_input(images['english']['small'])
+    
+    
+    model_init_test = {'test01': {
+                            'description': "Counting parameters of detector module.",
+                            "method": "unit_test.count_parameters",
+                            'input': ["unit_test.easyocr.ocr.detector"],
+                            'output': count_parameters(ocr.detector),    
+                            'severity': "Error"
+                            },
+                      'test02': {
+                            'description': "Calculating total norm of parameters in detector module.",
+                            "method": "unit_test.get_weight_norm",
+                            'input': ["unit_test.easyocr.ocr.detector"],
+                            'output': get_weight_norm(ocr.detector),    
+                            'severity': "Warning"
+                            },
+                      'test03': {
+                            'description': "Counting parameters of recognition module.",
+                            "method": "unit_test.count_parameters",
+                            'input': ["unit_test.easyocr.ocr.recognizer"],
+                            'output': count_parameters(ocr.recognizer),    
+                            'severity': "Error"
+                            },
+                      'test04': {
+                            'description': "Calculating total norm of parameters in recognition module.",
+                            "method": "unit_test.get_weight_norm",
+                            'input': ["unit_test.easyocr.ocr.recognizer"],
+                            'output': get_weight_norm(ocr.recognizer),    
+                            'severity': "Warning"
+                            },
+                      }
+    
+    
+    get_textbox_test = {}
+    
+    input0 = [ocr.detector,#detector
+              english_mini_bgr,#image 
+              2560,#canvas_size
+              1.0,#mag_ratio
+              0.7,#text_threshold 
+              0.4,#link_threshold 
+              0.4,#low_text
+              False, #poly #Fixed 
+              'cuda', #device #fixed ? 
+              ]
+    get_textbox_test.update({'test01': {
+                                'description': "Testing with default input.",
+                                "method": "unit_test.easyocr.detection.get_textbox",
+                                'input': replace(input0, 
+                                                 [0, 1], 
+                                                 ["unit_test.easyocr.ocr.detector",
+                                                  "unit_test.inputs.images.english.mini_bgr"
+                                                   ]),
+                                'output': easyocr.detection.get_textbox(*input0),    
+                                'severity': "Error"
+                                }})
+    
+    input0 = [ocr.detector,#detector
+              english_mini_bgr,#image 
+              1280,#canvas_size
+              1.2,#mag_ratio
+              0.6,#text_threshold 
+              0.3,#link_threshold 
+              0.3,#low_text
+              False, #poly #Fixed 
+              'cuda', #device #fixed ? 
+              ]
+    
+    get_textbox_test.update({'test02': {
+                            'description': "Testing with custom input.",
+                            "method": "unit_test.easyocr.detection.get_textbox",
+                            'input': replace(input0, 
+                                             [0, 1], 
+                                             ["unit_test.easyocr.ocr.detector",
+                                              "unit_test.inputs.images.english.mini_bgr"
+                                               ]),
+                            'output': easyocr.detection.get_textbox(*input0),
+                            'severity': "Error"
+                            }})
+    
+    input0 = [ocr.detector,#detector
+              english_mini_bgr,#image 
+              640,#canvas_size
+              0.8,#mag_ratio
+              0.8,#text_threshold 
+              0.5,#link_threshold 
+              0.5,#low_text
+              False, #poly #Fixed 
+              'cuda', #device #fixed ? 
+              ]
+    
+    get_textbox_test.update({'test03': {
+                            'description': "Testing with custom input.",
+                            "method": "unit_test.easyocr.detection.get_textbox",
+                            'input': replace(input0, 
+                                             [0, 1], 
+                                             ["unit_test.easyocr.ocr.detector",
+                                              "unit_test.inputs.images.english.mini_bgr"
+                                               ]),
+                            'output': easyocr.detection.get_textbox(*input0),
+                            'severity': "Error"
+                            }})
+    
+
+    input0 = [ocr.detector,#detector
+              english_mini_bgr,#image 
+              2560,#canvas_size
+              1.0,#mag_ratio
+              0.7,#text_threshold 
+              0.4,#link_threshold 
+              0.4,#low_text
+              False, #poly #Fixed 
+              'cuda', #device #fixed ? 
+              ]
+    output0 = easyocr.detection.get_textbox(*input0)
+    polys = output0[0]
+    group_text_box_test = {}
+    
+    input_ = [polys, 
+              0.1,# slope_ths 
+              0.5,#ycenter_ths
+              0.5,#height_ths
+              1.0,#width_ths 
+              0.05,#add_margin 
+              True#sort_output
+              ]
+    group_text_box_test.update({'test01': {
+                                'description': "Testing with default input.",
+                                "method": "unit_test.easyocr.utils.group_text_box",
+                                'input': input_,
+                                'output': easyocr.utils.group_text_box(*input_),    
+                                'severity': "Error"
+                                }
+                            })
+    input_ = [polys, 
+              0.05,# slope_ths 
+              0.3,#ycenter_ths
+              0.3,#height_ths
+              0.8,#width_ths 
+              0.03,#add_margin 
+              True#sort_output
+              ]
+    group_text_box_test.update({'test02': {
+                                'description': "Testing with custom input.",
+                                "method": "unit_test.easyocr.utils.group_text_box",
+                                'input': input_,
+                                'output': easyocr.utils.group_text_box(*input_),    
+                                'severity': "Error"
+                                }
+                            })
+    input_ = [polys, 
+              0.12,# slope_ths 
+              0.7,#ycenter_ths
+              0.7,#height_ths
+              1.2,#width_ths 
+              0.1,#add_margin 
+              True#sort_output
+              ]
+    group_text_box_test.update({'test03': {
+                                'description': "Testing with custom input.",
+                                "method": "unit_test.easyocr.utils.group_text_box",
+                                'input': input_,
+                                'output': easyocr.utils.group_text_box(*input_),    
+                                'severity': "Error"
+                                }
+                            })
+    
+    input0 = [None, 
+              20, #min_size
+              0.7, #text_threshold - fixed
+              0.4, #low_text - fixed
+              0.4, # link_threshold - fixed
+              2560, #canvas_size -fixed
+              1., #mag_ratio - fixed
+              0.1, #slope_ths - fixed
+              0.5, #ycenter_ths - fixed
+              0.5, #height_ths - fixed
+              0.5, #width_ths - fixed
+              0.1, #add_margin - fixed
+              True, #reformat - fixed
+              None #optimal_num_chars  - fixed
+              ]
+    
+    detect_test = {}
+
+    input_ = replace(input0, [0,1], [english_mini_bgr, 20])
+    detect_test.update({'test01': {
+                        'description': "Testing with default input.",
+                        "method": "unit_test.easyocr.ocr.detect",
+                        'input': replace(input_, 0, "unit_test.inputs.images.english.mini_bgr"),
+                        'output': ocr.detect(*input_),    
+                        'severity': "Error"
+                        },
+                    })
+    input_ = replace(input0, [0,1], [english_small_bgr, 20])
+    detect_test.update({'test02': {
+                        'description': "Testing with custom input.",
+                        "method": "unit_test.easyocr.ocr.detect",
+                        'input': replace(input_, 0, "unit_test.inputs.images.english.small_bgr"),
+                        'output': ocr.detect(*input_),    
+                        'severity': "Error"
+                        },
+                    })
+    input_ = replace(input0, [0,1], [english_small_bgr, 100])
+    detect_test.update({'test03': {
+                        'description': "Testing with custom input.",
+                        "method": "unit_test.easyocr.ocr.detect",
+                        'input': replace(input_, 0, "unit_test.inputs.images.english.small_bgr"),
+                        'output': ocr.detect(*input_),    
+                        'severity': "Error"
+                        },
+                    })
+    
+    get_image_list_test = {}
+    output0 = ocr.detect(english_small_bgr)
+    input0 = [output0[0][0], 
+              output0[1][0], 
+              english_small_gray, 
+              64, #model_height 
+              True# sort_output
+              ]
+    input_ = replace(input0, 2, "unit_test.inputs.images.english.small_gray")
+    get_image_list_test.update({'test01': {
+                        'description': "Testing with default input.",
+                        "method": "unit_test.easyocr.utils.get_image_list",
+                        'input': input_,
+                        'output': easyocr.utils.get_image_list(*input0),    
+                        'severity': "Error"
+                        },
+                    })
+    
+    output0 = ocr.detect(english_mini_bgr)
+    input0 = [output0[0][0], 
+              output0[1][0], 
+              english_mini_gray, 
+              64, #model_height 
+              True# sort_output
+              ]
+    input_ = replace(input0, 2, "unit_test.inputs.images.english.mini_gray")
+    get_image_list_test.update({'test02': {
+                        'description': "Testing with custom input.",
+                        "method": "unit_test.easyocr.utils.get_image_list",
+                        'input': input_,
+                        'output': easyocr.utils.get_image_list(*input0),    
+                        'severity': "Error"
+                        },
+                    })
+    
+    output0 = ocr.detect(english_mini_bgr)
+    input0 = [output0[0][0], 
+              output0[1][0], 
+              english_mini_gray, 
+              64, #model_height 
+              True# sort_output
+              ]
+    image_list, max_width = easyocr.utils.get_image_list(*input0)
+    
+    input0 = [ocr.character, 
+              64, #imgH - fixed 
+              int(max_width), 
+              ocr.recognizer, 
+              ocr.converter, 
+              image_list[:2],
+              '', #ignore_char, 
+              'greedy', #decoder, 
+              5, #beamWidth, 
+              1, #batch_size, 
+              0.1, #contrast_ths, 
+              0.5, #adjust_contrast, 
+              0.003, #filter_ths,
+              1, #workers, 
+              "cuda" #device
+              ]
+    
+    get_text_test = {}
+        
+    output_ = easyocr.recognition.get_text(*input0)   
+    input_ = replace(input0, 
+                     [0, 3, 4], 
+                     ["unit_test.easyocr.ocr.character", 
+                      "unit_test.easyocr.ocr.recognizer", 
+                      "unit_test.easyocr.ocr.converter"]
+                     )
+
+    get_text_test.update({'test01': {
+                        'description': "Testing with default input.",
+                        "method": "unit_test.easyocr.recognition.get_text",
+                        'input': input_,
+                        'output': output_,    
+                        'severity': "Error"
+                        },
+                    })
+    
+    input0 = [ocr.character, 
+              64, #imgH - fixed 
+              int(max_width), 
+              ocr.recognizer, 
+              ocr.converter, 
+              image_list[:2],
+              '', #ignore_char, 
+              'greedy', #decoder, 
+              4, #beamWidth, 
+              1, #batch_size, 
+              0.05, #contrast_ths, 
+              0.3, #adjust_contrast, 
+              0.001, #filter_ths,
+              1, #workers, 
+              "cuda" #device
+              ]
+    
+    output_ = easyocr.recognition.get_text(*input0)   
+    input_ = replace(input0, 
+                     [0, 3, 4], 
+                     ["unit_test.easyocr.ocr.character", 
+                      "unit_test.easyocr.ocr.recognizer", 
+                      "unit_test.easyocr.ocr.converter"]
+                     )
+    get_text_test.update({'test02': {
+                        'description': "Testing with custom input.",
+                        "method": "unit_test.easyocr.recognition.get_text",
+                        'input': input_,
+                        'output': output_,    
+                        'severity': "Error"
+                        }})
+    
+    input0 = [ocr.character, 
+              64, #imgH - fixed 
+              int(max_width), 
+              ocr.recognizer, 
+              ocr.converter, 
+              image_list[:2],\
+              '', #ignore_char, 
+              'greedy', #decoder, 
+              6, #beamWidth, 
+              4, #batch_size, 
+              0.2, #contrast_ths, 
+              0.6, #adjust_contrast, 
+              0.005, #filter_ths,
+              1, #workers, 
+              "cuda" #device
+              ]
+    
+    output_ = easyocr.recognition.get_text(*input0)   
+    input_ = replace(input0, 
+                     [0, 3, 4], 
+                     ["unit_test.easyocr.ocr.character", 
+                      "unit_test.easyocr.ocr.recognizer", 
+                      "unit_test.easyocr.ocr.converter"]
+                     )
+    get_text_test.update({'test03': {
+                        'description': "Testing with custom input.",
+                        "method": "unit_test.easyocr.recognition.get_text",
+                        'input': input_,
+                        'output': output_,    
+                        'severity': "Error"
+                        }})
+    
+    
+    get_paragraph_test = {}
+    output0 = ocr.detect(english_mini_bgr)
+    input0 = [output0[0][0], 
+              output0[1][0], 
+              english_mini_gray, 
+              64, #model_height 
+              True# sort_output
+              ]
+    image_list, max_width = easyocr.utils.get_image_list(*input0)
+    
+    input0 = [ocr.character, 
+              64, #imgH - fixed 
+              int(max_width), 
+              ocr.recognizer, 
+              ocr.converter, 
+              image_list[:2],
+              '', #ignore_char, 
+              'greedy', #decoder, 
+              5, #beamWidth, 
+              1, #batch_size, 
+              0.1, #contrast_ths, 
+              0.5, #adjust_contrast, 
+              0.003, #filter_ths,
+              1, #workers, 
+              "cuda" #device
+              ]
+    
+    output0 = easyocr.recognition.get_text(*input0)   
+    input_ = [output0, 
+              1, #x_ths
+              0.5, #y_ths 
+              'ltr' #mode
+              ]
+    get_paragraph_test.update({'test01': {
+                        'description': "Testing with default input.",
+                        "method": "unit_test.easyocr.utils.get_paragraph",
+                        'input': input_,
+                        'output': easyocr.utils.get_paragraph(*input_),    
+                        'severity': "Error"
+                        }})
+    input_ = [output0, 
+              0.5, #x_ths
+              0.3, #y_ths 
+              'ltr' #mode
+              ]
+    get_paragraph_test.update({'test02': {
+                        'description': "Testing with custom input.",
+                        "method": "unit_test.easyocr.utils.get_paragraph",
+                        'input': input_,
+                        'output': easyocr.utils.get_paragraph(*input_),    
+                        'severity': "Error"
+                        }})
+    input_ = [output0, 
+              1.5, #x_ths
+              1, #y_ths 
+              'ltr' #mode
+              ]
+    get_paragraph_test.update({'test03': {
+                        'description': "Testing with custom input.",
+                        "method": "unit_test.easyocr.utils.get_paragraph",
+                        'input': input_,
+                        'output': easyocr.utils.get_paragraph(*input_),    
+                        'severity': "Error"
+                        }})
+    
+    
+    input_recog = [None, 
+              None, #horizontal_list
+              None, #free_list
+              'greedy', #decoder
+              5, #beamWidth
+              1,#batch_size
+              0, #workers
+              None, #allowlist
+              None, #blocklist
+              1, #detail
+              None, #rotation_info
+              False,#paragraph
+              0.1,#contrast_ths
+              0.5, #adjust_contrast
+              0.003, #filter_ths
+              0.5, #y_ths
+              1.0, #x_ths
+              True, #reformat
+              'standard'#output_format
+              ]
+    
+    recognize_test = {}
+    
+    h_list, f_list = ocr.detect(english_mini_bgr)
+    input_ = replace(input_recog, 
+                     [0, 1, 2], 
+                     [english_mini_gray, h_list[0], f_list[0]])
+    recognize_test.update({'test01': {
+                        'description': "Testing with default input.",
+                        "method": "unit_test.easyocr.ocr.recognize",
+                        'input': replace(input_, 0, "unit_test.inputs.images.english.mini_gray"),
+                        'output': ocr.recognize(*input_),    
+                        'severity': "Error"
+                        }})
+    
+    h_list, f_list = ocr.detect(english_small_bgr)
+    input_ = replace(input_recog, 
+                     [0, 1, 2], 
+                     [english_small_gray, h_list[0], f_list[0]])
+    recognize_test.update({'test02': {
+                        'description': "Testing with custom input.",
+                        "method": "unit_test.easyocr.ocr.recognize",
+                        'input': replace(input_, 0, "unit_test.inputs.images.english.small_gray"),
+                        'output': ocr.recognize(*input_),    
+                        'severity': "Error"
+                        }})
+
+    readtext_test = {}
+    #english_tiny_bgr, _ = easyocr.utils.reformat_input(images['english']['tiny'])
+    input_ = ["unit_test.inputs.images.english.tiny", 'en']
+    ocr = get_easyocr('en')
+    _, pred, confidence = ocr.readtext(images['english']['tiny'])[0]
+    output_ = [pred, confidence]
+    readtext_test.update({'test01': {
+                        'description': "Reading English text.",
+                        "method": "unit_test.easyocr_read_as",
+                        'input': input_,
+                        'output': output_,    
+                        'severity': "Error"
+                        }})
+    #french_tiny_bgr, _ = easyocr.utils.reformat_input(images['french']['tiny'])
+    input_ = ["unit_test.inputs.images.french.tiny", 'fr']
+    ocr = get_easyocr('fr')
+    _, pred, confidence = ocr.readtext(images['french']['tiny'])[0]
+    output_ = [pred, confidence]
+    readtext_test.update({'test02': {
+                        'description': "Reading French text.",
+                        "method": "unit_test.easyocr_read_as",
+                        'input': input_,
+                        'output': output_,    
+                        'severity': "Error"
+                        }})
+    #chinese_tiny_bgr, _ = easyocr.utils.reformat_input(images['chinese']['tiny'])
+    input_ = ["unit_test.inputs.images.chinese.tiny", 'ch_sim']
+    ocr = get_easyocr('ch_sim')
+    _, pred, confidence = ocr.readtext(images['chinese']['tiny'])[0]
+    output_ = [pred, confidence]
+    readtext_test.update({'test03': {
+                        'description': "Reading Chinese (simplified) text.",
+                        "method": "unit_test.easyocr_read_as",
+                        'input': input_,
+                        'output': output_,    
+                        'severity': "Error"
+                        }})
+    #korean_tiny_bgr, _ = easyocr.utils.reformat_input(images['korean']['tiny'])
+    input_ = ["unit_test.inputs.images.korean.tiny", 'ko']
+    ocr = get_easyocr('ko')
+    _, pred, confidence = ocr.readtext(images['korean']['tiny'])[0]
+    output_ = [pred, confidence]
+    readtext_test.update({'test04': {
+                        'description': "Reading Korean text.",
+                        "method": "unit_test.easyocr_read_as",
+                        'input': input_,
+                        'output': output_,    
+                        'severity': "Error"
+                        }})
+    
+    
+    
+    solution_book = {
+            'inputs':{'images': image_preprocess,
+                      'easyocr_config': easyocr_config
+                      },
+            'tests':{
+                 "model initialization": model_init_test,
+                 "get_textbox function": get_textbox_test,
+                 "group_text_box function": group_text_box_test,
+                 "detect method": detect_test,
+                 "get_image_list function": get_image_list_test,
+                 "get_text_test function": get_text_test,
+                 "get_paragraph_test function": get_paragraph_test,
+                 "recognize method": recognize_test,
+                 "readtext method": readtext_test,
+                 }
+            }
+            
+    
+    
+    with lzma.open(args.output, 'wb') as fid:
+        pickle.dump(solution_book, fid)
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Script to pack EasyOCR weight.",
+                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    parser.add_argument("-o", "--output", default=None, help="output path.")
+    parser.add_argument("-d", "--data_dir", default=None, help="data directory")
+    args = parser.parse_args()
+    main(args)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
--- a/unit_test/run_unit_test.py
+++ b/unit_test/run_unit_test.py
@@ -0,0 +1,19 @@
+
+import argparse
+from unit_test import UnitTest 
+
+# %%
+def main(args):
+
+    unit_test = UnitTest(args.easyocr, args.test_data, args.image_data_dir, args.verbose)
+    unit_test.do_test(args.verbose)
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Script to run EasyOCR unit tet.",
+                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    parser.add_argument("--easyocr", help="Directory of EasyOCR to test.")
+    parser.add_argument("-t", "--test_data", default="./data/EasyOcrUnitTestPackage.pickle", help="Path to test data.")
+    parser.add_argument("-d", "--image_data_dir", default="../examples", help="Path to directory that contains EasyOCR example images.")
+    parser.add_argument("-v", "--verbose", default=0, type = int, help="Verbosity level of report.")
+    args = parser.parse_args()
+    main(args)
--- a/unit_test/unit_test.py
+++ b/unit_test/unit_test.py
@@ -0,0 +1,262 @@
+import os
+import sys
+import importlib
+import pickle
+import lzma
+import PIL.Image
+import numpy as np
+
+import torch
+
+# %%
+class Attributes:
+    pass
+
+class UnitTest:
+    def __init__(self, 
+                 easyocr_module, 
+                 test_data = "./data/EasyOcrUnitTestPackage.pickle",
+                 image_data_dir = "../examples", 
+                 verbose = 0, 
+                 numeric_acceptance_error = 0.1):
+        
+        self.verbose = verbose
+      
+        easy_ocr_init = os.path.join(easyocr_module, "__init__.py")
+        if not os.path.isfile(easy_ocr_init):
+            raise FileNotFoundError("Invalid easyocr_module. The directory should contain __init__.py.")
+        
+        spec = importlib.util.spec_from_file_location("easyocr", easy_ocr_init)
+        easyocr = importlib.util.module_from_spec(spec)
+        sys.modules["easyocr"] = easyocr
+        spec.loader.exec_module(easyocr)
+        
+        self.easyocr = easyocr
+        if not hasattr(self.easyocr, 'utils'):
+            setattr(self.easyocr, 'utils', importlib.import_module('easyocr.utils'))
+        if not hasattr(self.easyocr, 'detection'):
+            setattr(self.easyocr, 'detection', importlib.import_module('easyocr.detection'))
+        if not hasattr(self.easyocr, 'recognition'):
+            setattr(self.easyocr, 'recognition', importlib.import_module('easyocr.recognition'))
+        
+        self.easyocr_dir = os.path.dirname(easyocr.__file__)
+        
+        print("Unit test is set for EasyOCR at {}".format(os.path.abspath(self.easyocr_dir)))
+        
+        self.image_data_dir = image_data_dir
+        
+        self.set_data(test_data)
+        self.set_easyocr()
+        self.numeric_acceptance_error = numeric_acceptance_error
+    
+    def set_data(self, test_data):
+        
+        self.inputs = Attributes()
+        
+        with lzma.open(test_data, 'rb') as fid:
+            solution_book = pickle.load(fid)
+        self.test_book = solution_book['tests']
+
+        if any([file not in os.listdir(self.image_data_dir) for file in solution_book['inputs']['images'].keys()]):
+            raise FileNotFoundError("Cannot find {} in {}.").format(', '.join([file for file in solution_book['inputs']['images'].keys() 
+                                                                               if file not in os.listdir(self.image_data_dir)], self.image_data_dir))
+        images = {os.path.splitext(file)[0]: {
+                        key: np.asarray(PIL.Image.open(os.path.join(self.image_data_dir, file)).crop(crop_box))[:,:,::-1] for (key,crop_box) in page.items() 
+                        } for (file,page) in solution_book['inputs']['images'].items()}
+
+        english_mini_bgr, english_mini_gray = self.easyocr.utils.reformat_input(images['english']['mini'])
+        english_small_bgr, english_small_gray = self.easyocr.utils.reformat_input(images['english']['small'])
+        images['english'].update({'mini_bgr': english_mini_bgr,
+                                  'mini_gray': english_mini_gray,
+                                  'small_bgr': english_small_bgr,
+                                  'small_gray': english_small_gray,
+                                  })
+
+        setattr(self.inputs, 'images', self.dict2attr(images))
+        setattr(self.inputs, 'easyocr_config', self.dict2attr(solution_book['inputs']['easyocr_config']))
+    
+    def dict2attr(self, dict_):
+        attr = Attributes()
+        [setattr(attr, key, self.dict2attr(value)) if isinstance(value, dict) else setattr(attr, key, value) for (key,value) in dict_.items()]        
+        return attr
+
+    def count_parameters(self, model):
+        return sum([param.numel() for param in model.parameters()])
+    
+    def get_weight_norm(self, model):
+        with torch.no_grad():
+            return sum([param.norm() for param in model.parameters()]).cpu().item()
+
+    def get_nested_attr(self, parent, attr):
+        if len(attr.split(".")) == 1:
+            return getattr(parent, attr)
+        else:
+            attrs = attr.split(".")
+            parent = getattr(parent, attrs[0])
+            attr = ".".join(attrs[1:])
+            attr = self.get_nested_attr(parent, attr)
+            return attr
+    
+    def easyocr_read_as(self, image, language):
+        if not isinstance(language, list):
+            language = [language]
+        reader =  self.easyocr.Reader(language)
+        _, pred, confidence = reader.readtext(image)[0]
+        reader = None
+        torch.cuda.empty_cache()
+        return pred, confidence
+    
+    def set_easyocr(self):
+        ocr = self.easyocr.Reader([self.inputs.easyocr_config.main_language])
+        setattr(self.easyocr, 'ocr', ocr)
+   
+    
+    def validate(self, test, solution, dtype):
+        if dtype == str:
+            return test == solution
+        elif np.issubdtype(dtype, np.integer):
+            return abs(1-test/solution) < self.numeric_acceptance_error
+        elif np.issubdtype(dtype, np.inexact):
+            return abs(1-test/solution) < self.numeric_acceptance_error
+        elif dtype == dict:
+            return self.are_dicts_equal(test, solution)
+        elif dtype == list or dtype == tuple:
+            return self.are_lists_equal(test, solution)
+        elif dtype == np.ndarray:
+            return (abs(1-test/solution) < self.numeric_acceptance_error).all()
+        elif dtype == torch.Tensor:
+            return (abs(1-test/solution) < self.numeric_acceptance_error).all()
+        else:
+            raise TypeError("Unsupport data type ({}) to validate. Supporting types are str, int, float, dict, list, np.ndarray, or torch.Tensor".format(dtype))
+    
+    def are_dicts_equal(self, test, solution):
+        if test.keys() == solution.keys():
+            return all([self.validate(test[key], solution[key], type(solution[key])) for key in solution.keys()])
+        else:
+            return False
+    
+    def are_lists_equal(self, test, solution):
+        if len(test) == len(solution):
+            return all([self.validate(tt, ss, type(ss)) for (tt,ss) in zip(test, solution)])
+        else:
+            return False
+
+    def is_list_or_tuple(self, test):
+        return isinstance(test, list) or isinstance(test, tuple)
+
+    #Should check length of results/solutions/dtypes 
+    def validate_all(self, results, solutions, dtypes):
+        if not isinstance(results, list):
+            results = [results]
+        if not isinstance(solutions, list):
+            solutions = [solutions]
+        if not isinstance(dtypes, list):
+            dtypes = [dtypes]
+        
+        
+        validation = []
+        for (result, solution, dtype) in zip(results, solutions, dtypes):
+            if (not self.is_list_or_tuple(result)
+                and not self.is_list_or_tuple(result)
+                and not self.is_list_or_tuple(result)
+                ): 
+                validation.append(self.validate(result, solution, type(solution)))
+            elif(self.is_list_or_tuple(result)
+                and self.is_list_or_tuple(result)
+                and self.is_list_or_tuple(result)
+                ):
+                validation.append(self.validate_all(results, solutions, type(solution)))
+            else:
+                raise
+        return all(validation)
+
+    def do_test(self, verbose = None):
+        if verbose is not None:
+            self.verbose = verbose
+        
+        num_module_to_test = len(self.test_book)
+        num_module_pass = 0
+        print("Testing EasyOCR: {:d} modules will be tested.\n".format(num_module_to_test))
+        for name,tests in self.test_book.items():
+            num_test = len(tests)
+            num_passed = 0
+            min_pass = sum([test['severity'] == 'Error' for test in tests.values()])
+            if self.verbose > 0:
+                print("##Testing module {}: {:d} tests will be performed.".format(name, num_test))
+            for test_id, test in tests.items():
+                if self.verbose > 1:
+                    print("#### {}: {}".format(test_id, test['description']))
+                
+                if test['method'].startswith('unit_test.'):
+                    test['method'] = '.'.join(test['method'].split('.')[1:])
+                test_method = self.get_nested_attr(self, test['method'])
+                
+                test['input'] = [(self.get_nested_attr(self, '.'.join(input_.split('.')[1:])) 
+                                 if input_.startswith('unit_test.') else input_) if isinstance(input_, str) else input_ for input_ in test['input']]
+                if verbose > 3:
+                    print("###### Input: {}".format(test['input']))
+                results = test_method(*test['input'])
+                if verbose > 2:
+                    print("###### Expected output: {}".format(test['output']))
+                    print("###### Received output: {}".format(results))
+                test_result = self.validate(results, test['output'], type(test['output']))
+                if test_result:
+                    num_passed += 1
+                    if self.verbose > 1:
+                        print("#### Passed. [{:d}/{:d}]".format(num_passed, num_test))
+                else:
+                    if test['severity'] == "Warning": 
+                        num_passed += 1
+                        if self.verbose > 1:
+                            print("#### Passed. [{:d}/{:d}]".format(num_passed, num_test))
+                        if self.verbose > 2:
+                            print("##### Warning: While the result is considered as passed, the test yields results ({}) \
+                              that are different from the expected values ({}). It is strongly recommended to make sure \
+                              that this is expected.".format(results, test['output']))
+                    else:
+                        if self.verbose > 1:
+                            print("#### Failed")
+                        if self.verbose > 2:
+                            print("##### The test yields results ({}) which are different from the expected values ({}).")
+        
+            if num_passed >= min_pass:
+                num_module_pass += 1
+                if self.verbose > 0: 
+                    print("##Module {}: Passed.\n".format(name))
+            else:
+                print("##Module {}: Failed.\n".format(name))
+        
+        print("#"*50)
+        if num_module_pass >= num_module_to_test:
+            print("Testing completed:\n Final result: Passed.")
+        else:
+            print("Testing completed:\n Final result: Failed.")
+        
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+