Table of contents
Expand | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
|
...
Let's say you need to parse <meta>
tags with a property
attribute from the topic page on the ZennoLab forum . You can't get to them through the action designer . these tags are not displayed in any way. Our actions:
Go to the required page
We run the code view window (in this case, you can use both the DOM and the source code, this will not affect the final result in any way) and look at the necessary tags (there are several of them, but only one will be given here):
All tags have the same structure: they always start with
<meta property =
and end with>
in quotes, immediately afterproperty, the
name of this property, and in thecontent
attribute - the content.Copy the content into the regular expression tester using the button of the same name. Based on the analysis from the previous step, create a regular line -
(?<= <Meta <meta\ property=)"([aza-z:]+)"\s+content="(.*?)"(?=>)
With an action Text processing and its Regex actions, we get the values we need from the page code and save them to the table:
Small explanations for the screenshot:
{-Page.Dom-}
- this variable stores the DOM of the tab. For source code, this is{-Page.Source-},
for text-{-Page.Text-}.
You can find others in the variables window .Why was column zero been excluded? Bracket group was used in the regular expression ((?<=<meta\ property=)"([aza-z:]+)"\s+content="(.*?)"(?=>) - two groups are highlighted in red). When testing in the regular expression tester, going to the Groups tab , you will notice that three groups were found, despite the fact that we have two of them: the very first group contains the full match text, and then the groups that have been defined follow. And since the numbering starts from zero, we exclude exactly the column with the number 0, not 1.
...