Viewing page text

Table of contents


Description

With this tool, you can easily view the source code (Source), DOM model, and displayed text of the page loaded in the browser.

You can read the difference between DOM and page source code in the article in block Description

If you are working on the Chrome engine, then alternatively you can use the web developer tools


What is it used for?

This tool is used when you need to better understand the structure of the page:


How to open a window?

The button to enable this window is located to the right of the address bar of the browser.


How to work with a window?

When you click on the icon, a window opens:

Content selection

Here you need to choose what you want to view: DOM (default), source code or visible text on the page (the difference between Source and DOM).

Wrap by words

When this option is activated, if the line is too long, then it will be moved to the next one, and not hidden outside the window border. As an example, a screenshot of the same window, but with this option active:

Copy to regex tester

When you click on this button, the regular expression tester will be run, and the contents of the window will be automatically copied into it.


Usage example

Let's say you need to parse <meta> tags with a property attribute from the topic page on the ZennoLab forum . You can't get to them through the action designer . these tags are not displayed in any way. Our actions:

  • Go to the required page

  • We run the code view window (in this case, you can use both the DOM and the source code, this will not affect the final result in any way) and look at the necessary tags (there are several of them, but only one will be given here):

    All tags have the same structure: they always start with <meta property = and end with

    > in quotes, immediately after property, the name of this property, and in the content attribute - the content.

  • Copy the content into the regular expression tester using the button of the same name. Based on the analysis from the previous step, create a regular line - (?<=<meta\ property=)"([a-z:]+)"\s+content="(.*?)"(?=>)

  • With an action Text processing and its Regex actions, we get the values we need from the page code and save them to the table:

     

Small explanations for the screenshot:

  • {-Page.Dom-} - this variable stores the DOM of the tab. For source code, this is{-Page.Source-},for text- {-Page.Text-}. You can find others in the variables window .

  • Why was column zero been excluded? Bracket group was used in the regular expression ((?<=<meta\ property=)"([a-z:]+)"\s+content="(.*?)"(?=>) - two groups are highlighted in red). When testing in the regular expression tester, going to the Groups tab , you will notice that three groups were found, despite the fact that we have two of them: the very first group contains the full match text, and then the groups that have been defined follow. And since the numbering starts from zero, we exclude exactly the column with the number 0, not 1.


Useful links