Stage 5. Module training

This is an archived article, no longer relevant.

Use the CapMonster Cloud service to create your own modules. Detailed instructions can be found at this link - Creating a custom module

Table of contents


After all the previous steps have been completed, you can start training our recognition module.

General training concept

At the beginning, we recommend choosing a simpler core and doing a quick learning with a large number (one third) of characters for the test. Conduct a couple of dozen training iterations and see what came of it. Correct the parameters and train again. If all is well, then you can train the network with a more powerful structure and better quality.

Three types of character recognition errors

  1. Misrecognition is when a character really exists, but it is not recognized correctly. For example, we show the character "a" to the module, and it finds the character "c" there.

  2. Skipping a symbol is when there is a symbol, but the module does not see any symbol at all, i.e. we show to it the symbol "a", and the module says that there is nothing here.

  3. A false positive is when there is no symbol, for example, between two symbols, but the module finds something there.

Module training

Core settings

Core power - the higher the core power, the better the quality of character recognition, but the lower the core speed and, as a result, more CPU time is spent on captcha recognition. You shouldn't make a very powerful core right away, try with the weakest one, perhaps the result of the recognition will be quite enough for you. When you change this parameter, next to you you can see the estimate of the complexity of the core. The complexity of the core reflects the amount of processor time required to run the core at one point. There are no units of measurement, just a number to help you compare the two cores. If one core has twice the complexity than the other, then this means that it will work twice as long, all other things being equal). In addition to power, the complexity of the core is also greatly influenced by the size of the character recognition window. As a result, by controlling these two parameters, you can create a core that suits you both in recognition quality and in speed.

Training settings

Learning speed - the slower the learning, the better it is. But do not immediately teach for a long time and efficiently. Try to teach quickly first, it is quite possible that some errors will appear, you will correct them and then you can teach the module more efficiently. Or perhaps the result will be sufficient.

The starting training parameter is a parameter that is responsible for the starting training rate. You need to change it only if one of the learning errors, which will be discussed below, is encountered.

Scatter of symbol centers - during training, the symbols you have collected are presented to the cores and their center is at the point that you specified. But you can slightly increase the spread of this point. This can lead to better recognition, or worse. Below, in the Unit Testing section, it is written when this parameter can be useful.

Intensity of training on false data - it is recommended to increase the value if false positives occur much more often than other errors during recognition or training. It is recommended to start training with the default parameter: 10.

Part of training symbols - the symbols you collect will participate in the training of the core, but not all. Some of them will go to the core test during training (first graph). The test is also very important, it helps to understand how the training proceeds and what parameters can be corrected in order to better retrain the core. The test should have at least 30-50 characters to understand how the training is going. It is also not worth giving most of the symbols for the test, because the more symbols are involved in training the core, the better. We recommend that you do this: train with a large number of test symbols for the first time, and then, when all the parameters are selected, retrain the core on almost all collected symbols, leaving the minimum possible for the test.

Recognition settings

Recognition threshold - when the core is shown an area on the captcha (or during training), it outputs for each symbol it knows a number in the range from 0 - if it does not see any signs of this symbol to 1 - if it is absolutely sure that here, at this point , this symbol is located. But we do not need a number, but an exact answer, whether there is a symbol here or not. This is what the recognition threshold is for. All responses of the core, where it gives out a number for some character greater than the recognition threshold, we consider as "yes, this character is here." If the core outputs a number less than, then we do not accept the character. Below, in the analysis of learning errors, you will read how to manipulate this parameter in order to solve some learning problems.

The minimum distance between characters in a captcha is a very useful parameter that does not allow finding other characters in the area of ​​an already found character, thereby greatly reducing the character recognition errors. Open the captcha in Paint, scale it according to your filters and calculate the minimum distance (in pixels) between two characters. Roughly calculate the minimum possible such distance among all captchas. If you are mistaken, then in the test of the recognition module you will learn how to correct this parameter correctly. In this case, retraining is not required.

Number of captchas for testing - during training, a captcha recognition test will be constantly carried out in order to find out the current percentage of module recognition. Many captchas are long tests, but the percentage is correctly calculated, few captchas are quick tests, but not a very high-quality assessment of the module. After a couple of trainings, you will understand how to set this parameter, until that moment you can leave it alone.

Testing frequency - in order not to waste time, percentage testing is carried out not every training cycle, but every few such cycles. How many training cycles to carry out testing is set by this parameter.

Fast recognition - for very complex captchas (characters are written together in one line without spaces, heavily distorted, a lot of noise, many types of characters), you have to choose a large scale, make a very complex core, and choose a large spread of cents. All this leads to the fact that the recognition module works for a very long time. This parameter enables some optimization mechanisms and helps to speed up the recognition of such captchas by 5-10 times. But the percentage of recognition may suffer a little. This parameter can be set immediately before training - this will speed up captcha recognition tests, but it may not show the percentage of recognition quite correctly. You can try this optimization already at the stage of testing the module. To fix the recognition percentage, you may have to slightly change the settings for finding the center of mass. In general, we advise you not to enable this optimization at the stage of training the module, without it you will find out the maximum possible percentage of recognition that can be obtained on the captcha.

Training progress

After setting all the parameters, you can start the training itself, in this case, a window with graphs and progress bars will open, reflecting the current training progress. There are only three graphs.

  1. The first one displays the results of testing the module on your collected symbols.
    The green line of the graph shows how many symbols were recognized correctly.
    Yellow - how many characters the core did not find anything (second character recognition error).
    Red - how many characters were not recognized correctly (first character recognition error).

  2. The second graph shows the results of the false positive test:
    The green line shows the number of correct answers (in these cases, the core did not find anything and this is correct).
    The yellow line shows how many times the core has shown suspicious activity.
    The red line shows how many times the core has made a mistake (the third character recognition error).

  3. The third graph shows an approximate preliminary percentage of module recognition.

Stop learning

Training can be interrupted at any time when you realize that nothing better will work out.

After interrupting learning or stopping it naturally (after 300 cycles), you need to choose which core you are going to use in the module:

  1. With the best recognition percentage - in this case, not the last core will be taken, but the one with the maximum recognition percentage.

  2. The latter - if you used few captchas for the test (less than 50) in the recognition test, and the recognition percentage of the last cores is not much less than that of the core with the maximum percentage, then it is better to take the last core.

  3. Leave the core from the previous training, if with the previous settings you received a higher percentage of recognition.

How It should be

With good training, the graphs should look like this:

On the first chart (Character Recognition), the green line rises to the maximum, while the yellow and red lines fall to 0.

On the second chart (False positives), the green line is always at the top, the red line is just above 0, and the yellow line is 5-10 times higher than the red one. In rare cases, the yellow line may exceed the green one, the main thing is that the red line is close to 0.

In the third graph, the recognition percentage is gradually increasing. At first it will grow quickly, then it will grow more and more slowly.

The screenshot was taken on a simple captcha with correctly selected parameters, keep in mind that your charts may be different, the initial percentage may be less, the rise of the green line, the decrease in yellow and red on the first chart may be slower. It all depends on the training settings, the complexity of the captcha. The main thing is the trend.

Learning problems and fixes

The very first rule - if something does not work out for a long time and the tips written below do not really help, just ask on the forum in the program section. Don't waste your time!

There are several problems you may encounter when teaching. They can be diagnosed using graphs reflecting the current course of training:

  1. After the start of training, several tens of cycles have already passed, and the green and red lines on the first chart do not rise.

The core does not learn. Too small starting parameter of training, start training again, increasing this parameter by 10 times. Repeat until the lines on the graphs start to grow.

2. During training, the red bar rose above the green one and dominates the first chart.

  • Perhaps you have set too large a starting parameter. Decrease it 10 times, and start training again and so several times until the situation improves or the previous error appears. If the previous error appears, then the problem is not only in the starting parameter or in general, it has nothing to do with it.

  • There may be too few symbols to train, or the symbol filters are not configured correctly and need to be reconfigured.

  • Perhaps you confused something when cutting out the symbols and indicated them with the wrong names. Check the collection in the filters section (there is a Show text button, by clicking it you can check if the drawn symbol matches the symbol under it with the written text).

3. The red line in the first chart is quite far from 0.

Too many character recognition errors like (1).

There can be many reasons:

  • Few symbols collected.

  • A very complex captcha.

  • The network is too simple.

  • Learning mode is too fast.

  • Center spread too wide during training.

  • There are many types of symbols.

4. The very first check before the first training cycle shows too many Bad results (red line).

There is nothing wrong with that, you just set the recognition threshold to 0.5 or lower. The situation should be corrected after the first cycle of training.

5. The number of cycles has exceeded 100, and the yellow line on the first chart still does not fall to zero.

The core does not see too many characters - character recognition error (2).

Possible reasons:

  • The learning threshold is too high. The normal learning threshold is 0.5-0.6.

  • The starting parameter of training is too small. Try to increase it 3-5 times.

  • There are few symbols of teaching, despite the fact that they are highly distorted.

  • The network is too complex.

  • Learning mode is too fast.

6. In the second chart, the red line has moved up too high.

The core too often sees characters where there are none - the third character recognition error.

Increase the intensity of training on false data.

7. On the third graph, the recognition percentage is 0 and does not grow, and all the other graphs are in order.

The training is going well, but for some reason the captcha is not recognized. You need to look at what the recognition module produces during the test.

Train the module so that graphs 1 and 2 are aligned at approximately the same values.

Stop teaching, go to the unit testing tab and start the test. See what recognition errors occur and read how to fix these errors in the unit testing help section.

Module test settings

Threads - the number of threads in which captchas are tested

Space - the minimum distance between characters in the captcha.

Type of comparison- if several words match in the test, the test is considered successful.

Minimum distance between words - the minimum distance between words in a captcha.

Multi-line captcha is a parameter required for popular captchas such as SolveMedia. In this case, the captcha will be processed as a multi-line captcha.

Minimum distance between words is the distance between the lower left and upper right corners of characters on different lines.

Limit for the answer length - depending on the length of the answer, some answers will be discarded. This limitation is set only in the Unit Test. After each change of this parameter, it is necessary to re-select the parameters.

Video instruction on YouTube link. (in Russian)