Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info
iconfalse
titleThis page has been translated automatically

We want to provide you with the latest help content in your language as soon as possible. This page has been translated automatically and may contain grammatical errors or inaccuracies. We want this content to be useful to you. Please let us know at the bottom of this page if this information was helpful.

View the original article in Russian: Обработка текста

Описание

Description

This block serves various manipulations with text, which are very often required in practice. Process the parsed text, clean it of garbage, translate it into other languages - all this, and much more, can be done by a text processing "cube".

...

How to add an action to a project?

Via context menu Add Action -> Data -> Text Processing

...

Or use smart search.

Where is word processing applied?

How

...

to use the action?

The properties window consists mainly of three areas:

...

Info

Place the cursor in the input line area, press Ctrl + Space and select useful constants and project variables from the drop-down list. For example, this way you can quickly insert a proxy of the project {-Project.Proxy-} or the URL of the active tab {-Page.Url-}

...

(you can find other available environment variables in the article Variables window)

...

All possible operations with this "cube":

...

Before Application: {"animal": "cat"}
After: \ {"animal": \ "cat"}

...

Regex

...

Processing text with regular expressions.
Regulars are very convenient for parsing to parse strings to find the required substring for a given pattern. This action allows you to parse not only the first found value, but also the entire group and save the values to variables or a table. Also, optionally, if nothing is found, the result will be an error and exit on the red branch. In total, there are six options for saving the results after processing with a regular expression:

  • only the first found value is saved;

  • all found matches are saved to the list;

  • one value is saved, but either the last or random;

  • one or more values are stored in the list, but at a specific index (ordinal in the list of found values). Indexes can be listed with commas (4,5,9), set the interval through hyphens (4-9), or a combination of the above methods (4,5, 9-11);

  • the same as in the previous paragraph, but without a list, and the value of each found index can be put into its own variable;

  • matches are saved to the table.

To create regular expression patterns, ZennoPoster provides a very convenient tool - Regular Expression Constructor .

Input field “Regex”

In this field, you must enter a regular expression, which will be used to search the text. Example -(?<=<title>).*(?=</title>)

Regular Expression Tester can help you in writing regular expressions

Error with an empty answer

If this setting is checked and the regular expression does not find anything in the text, then the action will fail (exit via the red branch).

Note

Please note that if the regular expression returns an empty string, then even if the "Error with empty response" setting is enabled, the action will be released on the green branch: for example, the site has nothing in the title tag: <title></title>, in this In case the regular expression (?<=<title>).*(?=</title>) will work, but return an empty string - the action will succeed.
But if there were no <title></title> in the text at all, in this case the expression will not find anything and the action will be released on the red thread.

What to take

The first

The first match found will be saved to the variable.

All

Save all search results to a list.

One coincidence

Keep only one match.
In the field that appears, you can enter the sequence number of the match (numbering from zero!) Or select the Last or Random (random) value.

...

Match numbers

Save to the list only the specified match numbers (numbering from zero !, specify separated by commas).

Into variables

This function is used when working with group regular expressions. An example under the spoiler:

Expand
titleClick here to expand the example

Let's imagine that there is the following text:

Code Block
languagenone
21.01.2003, 11:34:00.9299
11.12.2013, 01:22:55.3021
04.01.2007, 08:00:06.0032

And the task is to disassemble it into its components. To do this, we will use the following regular expression: (\d{2}).(\d{2}).(\d{4}), (\d{2}):(\d{2}):(\d{2}).(\d{4})

This is how the output looks in the Regular Expression Tester:

Image Added

Let's imagine that we need to take in variables the day, month and year from the second row. Here's how you can do it:

Image Added

The match number in our case is the line number. Because the numbering here starts from zero, then in order to take the second line, we indicate 1

Next, you need to specify the group number and the variable to which the result will be saved. Here, too, the numbering of groups starts from zero. But group 0 contains the entire found line (11.12.2013, 01:22:55.3021). Therefore, for the day we indicate the group number 1, for the month - 2 and for the year - 3.

In the table

It is very similar to the previous function (To variables) with the difference that not one result is saved here, but everything is stored in a table. You can exclude some of the groups found from the final result.

Expand
titleClick here to expand the example

We use the same text:

Code Block
languagenone
21.01.2003, 11:34:00.9299
11.12.2013, 01:22:55.3021
04.01.2007, 08:00:06.0032

Our task is to parse it and save it to a table. To do this, we will use the following regular expression: (\d{2}).(\d{2}).(\d{4}), (\d{2}):(\d{2}):(\d{2}).(\d{4})

This is how the output looks in the Regular Expression Tester:

Image Added

Let's also imagine that we don't need seconds and milliseconds in the final table. This is how it might look:

Image Added

The group under index 0 contains the entire match (in our case, the string), so we exclude it. In groups 6, 7 - seconds and milliseconds, respectively.

Usage example

Let's look at a specific example of - parsing links by with regular expressions, composed using this a constructor.

For example, we have a task - to parse get links to the profiles of active users of the ZennoLab forum. Let's get started:

...

  1. With the help of the cube “Getting Taking the value” value, we get the HTML code of the element in which the links to the users who are online on the online forum are placedposted.

  2. Add the “Regex” action. To compose the pattern used in the properties of the “Regex” action, use the Regular Expression Constructor.

  3. Add the “html“ variable to the input in the action properties, and save the result to the “urls” list.

  4. After starting launching the cube, we get unique id in the list, which can be used to generate form the URL of user profiles.

...

Spintax

Randomization or uniqueness of text. With the help of spintax it is convenient to create synonymization of texts. Spinax is a construction of curly braces and vertical slashes that allows you to randomly substitute substrings from a string. In its simplest form, the spintax looks like this: {variant1 | variant2 | variant3}. When performing this action, one of the three options will accidentally fall into the resulting variable.

But spintax constructions can be more complex and have multi-level nesting, which is why you can get thousands of different variants from one text.

...

...

Extended spintax syntax

  • {Red|White|Blue} - the resulting text contains one of the values is included in the resulting text, for example: "White"

  • [ Red| White| Blue] - the resulting text contains a permutation of values, for example: "White Blue Red"

  • [+_+Red|White|Blue] - the resulting text contains a permutation of values between which a separator is inserted, for example: "White_Red_Blue"

Nesting of templates is unlimited (for example: [+{_|-}+Red|White|Blue {1|2}] = "White«White-Blue 2-Red"Red»). Special characters can be escaped:[+\++Red|\[White\]|Blue]-result "«[White]+Red+ Blue"Blue»

...

Split

Separation of text by with any separator character (delimeter). This processing turns the string into an array of strings. In fact, this is a simpler analogue of RegExp for separating a string with characters.

Let's consider the work of a split using an example of a very common task - getting a login and password from a string. Usually, accesses to various accounts are stored in the form of line-by-line lists in the format - login: password . And here the delimiter is the colon symbol :

...

We insert into the input field our string or a variable containing it. In the properties, specify the separator - :, and below we assign a separate variable to each element of the resulting array of substrings. After processing the line, we get a login and password in each variable.

...

Separators

Here you need to specify the symbol (s) by which the data will be split.

...

Allow empty values

Let's look at this point with an example.

And so we have a string in the format name;surname;gender;year of birth An action might look like this:

...

But, if one of the components is missing, for example gender (Andrew;Paul;;1988) , then the year of birth will be written to the variable for gender (sex) This is exactly what the Allow Nulls setting has been created for such cases - if you enable it, an empty string will be written to the gender variable, and the year will be saved to the correct variable.

Usage example

Let's consider the work of a split using an example of a very common task - splitting a string with a proxy into its constituent parts. Very often purchased proxies have the following format: login:pass@host:port

There are two separators at once -: (colon) and @. This is what the action settings might look like:

...

Both characters are indicated here as a separator.

...

ToChar

Converts an integer value to Unicode characters .
Each Unicode character has its own numeric code and this functionality allows you to convert a numeric value to the corresponding characters. For example, the symbol ♛ has a numeric value 9819

...

ToLower

Changes letters to lowercase depending on the selected property: either all letters, or only the first letter of the string, or the first letter in each word.

...