Stage 6. Testing and improvement

This is an archived article, no longer relevant.

Use the CapMonster Cloud service to create your own modules. Detailed instructions can be found at this link - Creating a custom module

Table of contents


After training the module, it is necessary to test it.

In the test window of the module, you will not only find out what percentage of recognition it has and the speed of work, but you can also look at the result of its work. Those which line on which captcha it outputs.

This is very important information, according to it you can understand how to improve the quality of recognition and reduce the operating time of the module.

Testing settings

The settings can be changed and tests carried out several times, changing the parameters can change the percentage of recognition. In this case, the core does not need to be retrained, it will be possible to simply save the module.

  1. The number of threads to test.

  2. Thresold filter values.

  3. The minimum distance between symbols is a very important parameter! You can try increasing or decreasing slightly. It is relevant if you have a lot of extra letters in your answers (increase) or a lot of letters are missing (decrease).

  4. Comparison type: full match = full match with the real answer to the captcha, substring match and partial = setting that should be used if the site successfully accepts a partial response to the captcha, and not a full match. Then you can force the module to consider the correct answers, which in fact are only partially correct.

  5. Range value if using substring as comparison type. The number of correctly guessed symbols in a row, which we consider as a successful answer to the captcha.

  6. You can try to enable / disable quick recognition, this can also affect the percentage of recognition.

Let's recall the types of character recognition errors again.

Before moving on to captcha recognition errors, let's remember what character recognition errors are:

  1. Misrecognition is when a character really exists, but it is not recognized correctly. For example, we show the character "a" to the module, and it finds the character "c" there.

  2. Skipping a symbol is when there is a symbol, but the module does not see any symbol at all, i.e. we show it the symbol "a", and the module says that there is nothing here.

  3. A false positive is when there is no symbol, for example, between two symbols, but the module finds something there.

Improving recognition

In fact, all the work to improve the recognition percentage of your module comes down to balancing these three types of character recognition errors. Those. Ideally, your module should, if misrecognized, sometimes:

  1. Replace the correct character with the wrong one. For example, instead of "captcha" print "cagtcha".

  2. Not to see the symbol that is. For example, instead of "captcha" print "cptcha".

  3. Issue an extra character in the captcha text. For example, instead of "captcha" print "camptcha".

And each of these errors should occur approximately the same number of times.

Basic errors and how to fix them

The number of letters outputted by the module is very small, or there are none at all:

  1. At the same time, the training went well, i.e. the green chart was near the high, while the red and yellow ones dipped to zero.

  • You may have set the Minimum character spacing parameter too large. In this case, rare recognized characters will be correct. Those instead of the text "amcaptchatext" the module will return something like "aathet".

  • Perhaps you have confused something in the filters or with the centers of mass. It turns out that the module is well trained for symbols, but on the captcha itself it either does not meet these symbols, or does not meet them in the form for which it was trained. The simplest case is that in filters you changed the size of the captcha, but forgot to apply these filters to symbols. As a result, the module learns for small characters, and on the captcha it is shown large ones.

  • It is possible that the centers of mass are not determined where you defined the centers of the symbols when collecting them. If so, you will see it in the center of mass test by clicking on the captcha in the places of the found centers of mass. There the module will not have a response, but it will be somewhere below or above the found centers of mass. You can try to fix this without retraining - to increase the spread of the center of mass and the number of additional points so that they start to fall on the places where the module recognizes the symbol. Or, retrain the module with an increased spread of the center of the symbol instead of (together with) an increase in the spread at the center of mass.

2. During training, the green graph was close to yellow. The red was almost zero. Read about how to fix this situation in the tutorial section.

The number of letters is fine, but they are all (or most) not correct:

If at the same time the training went well, then perhaps this is a manifestation of the character recognition error number (3) - a false positive. If so, then in the center of mass test, by clicking on the centers of the symbols, you will receive the correct answer of the module, but by clicking on the centers of mass that lie nearby, not in the center of the symbol, the module will give out a bunch of other symbols that are not there.

It is corrected by retraining with an increase in the parameter responsible for preventing false positives.

There are a lot of letters and they are all wrong. Some letters can be repeated several times in a row:

  1. Same as the previous error, but also the parameter Minimum distance between characters is too small.

  2. Too many character recognition errors (3). The tutorial section describes how to reduce the number of these errors.

  3. Symbol acceptance threshold is too low.

  4. Combinations 1, 2, and 3.