Data (Tab operations)

Table of contents


Description

This action is designed to get data from a page.


How to add an action to a project?

Through the context menu Add Action β†’ Tabs β†’ Data

Or use smart search .


What is it used for?

  • Find and save the information you need from the page

  • Check if there are any values on the page

  • Parse text from page

  • Take page url

How to work with an action?

1. What to take

Select the type of data you want to take

  • DOM - document object model;

  • Source - the source code of the page;

  • Text - the visible text of the page;

  • URL - link address from the address bar.

Difference between Source and Dom

Source - the source code of the page received from the server.
DOM - is a tree of objects created by the browser in the computer memory based on the source code ( Source ).

To simplify things a lot, the browser works like this:

  1. You enter a URL into the address bar and press enter.

  2. The browser sends a request to the server.

  3. The server returns the response as the HTML source of the page ( Source )

  4. Based on the source code, the browser builds the DOM (Data Object Model - document object model)

    • handles errors (adds tags html, body, head, etc. if they were not written)

    • closes unclosed tags

    • adds a <tbody> tag to tables if it was not there. According to the DOM, tables (<table>) should have a <tbody> tag, and in HTML it can be omitted (this should be taken into account when building XPath and regular expressions )

    • processes scripts on the page (which can add new elements to the page, and do this even after the page is fully loaded)

  5. Finally, the DOM-based browser renders and shows you the content of the web page.

DOM can contain information and elements that will not be in the source code (Source) because it (the DOM) includes content that can be embedded using JavaScript.

When working with requests ( GET , POST and other types of requests ), you will always deal with Source.

There are two tools for viewing Source and DOM in ProjectMaker:

  • Examine the DOM source code and page text (available for all engines)

    Β 

    Β 

  • Web developer tools (Chrome engine only)

    Β 

Β 

2. Which tab

Select the tab from which to take data:

  • Active - the current active tab;

  • First - if there are several tabs, then take the first one in a row;

  • By name - specify the name of the tab;

  • By number - indicate the number of the tab, if there are several.

3. Process only the specified tags

If you need to process only one or a few specific HTML tags, then activate the checkbox and select the necessary options.

4. Parse the result

If you need to parse the result, you can do this by specifying the desired Regex regular expression, the number and numbers of matches, as well as where to save the result - to a variable or table. You can find the required regular expression using the Regular Expression Tester .

The controls that appear when the Parse data setting is enabled are the same as for Text Processing-Regex (there you will find a more detailed description).

To get data from the page, there is a more convenient tool - Parse data


Usage example

Let's take all the links on the page. We select to take DOM or Source, parse the result and specify the regular expression Regex:

(?<=href=")http.*?(?=")

Take all values and put the result in the list.

As a result, the list will contain all the links on this page.

Β