Stage 1. Creating a project and collecting captchas

This is an archived article, no longer relevant.

Use the CapMonster Cloud service to create your own modules. Detailed instructions can be found at this link - Creating a custom module

Table of contents


You need to open the program, create a new project, save it under a name that you understand.

Collecting of captcha collection

The first thing to do is to collect a collection of captchas and their recognitions, on which the new module will be studied and tested. This can be done in several ways.

  1. You can collect pictures, without answers, in any way convenient for you, and recognize them already in the program itself.

To do this, you need to specify the login and password of one of the manual captcha recognition services in the program settings (for example, RuCaptcha, AntiGate, etc.). Immediately after loading the captchas, you must select the appropriate recognition option. If you use the option of captcha recognition through manual recognition services, it is better to recognize captchas in separate groups. Captchas for collecting symbols can be recognized in the usual way, while captchas for training and testing are better recognized with 100% probability. This is when a captcha is sent to several people at the same time. The AntiGate and RuCaptcha services have such a recognition setting.

2. You can create a simple template in ZennoPoster for captcha collection and recognition. The ultimate goal at this stage is to have the collected captchas and their answers in a separate folder on the hard disk in a separate folder: a picture with a captcha + * .txt file containing the answer to the captcha must be presented in pairs. The names of the files must be the same, only the extensions must differ.

Or the name of the captcha files must be the text that is written on them. Those. if “qwe” is written on the captcha, then the captcha file should be something like this: “qwe.jpg”. The program will also accept this option.

How many captchas do you need

For simple captchas with little character distortion (or no distortion at all) - 300 pieces. For complex captchas - 1000 pieces. All these captchas must then be recognized in manual recognition services, which will cost from several tens of cents to a couple of dollars.

Captchas are needed for several purposes:

  • For collecting symbols. Each character needs from 3 to 150 copies, depending on the complexity of the captcha. See how many characters are in a captcha, how many varieties of characters there are in total, keep in mind that sometimes some characters in a captcha are rare. And copies of each symbol should be approximately the same number.

  • To prevent false positives (about 10 times less than captchas for collecting symbols).

  • To test the recognition module (about 100 captchas).

Splitting captchas

After adding, captchas will be divided into the above groups automatically. But you can set the splitting manually. Repartitioning will be impossible in the future, so if you don't know, it's better not to touch anything.

Video instruction on YouTube link. (in Russian)