etc_query :: help | etc

Description

This multipurpose Textpattern plugin allows you to retrieve data, transform it and display the result the way you like. It can query HTML/XML/JSON documents, databases or global variables, and can be useful for

importing data from external sites/feeds
querying databases
retrieving and modifying values of txp or url variables
performing all kind of calculations
iterating over arrays/ranges
customizing the output of Textpattern tags and other plugins
creating tables of contents, navigation links, image prefetching, pagination, conditional structures and much more.

It is in many points inspired by smd_xml and smd_query, but uses XPath to access the desired nodes.

Requirements

PHP 5.3 with DOM extension (enabled by default)
Basic knowledge of XPath. Here are some links kindly gathered by Julián :
- XPath tutorial at zvon.org
- XPath reference at zvon.org
- XPath test lab at zvon.org
- XPath tutorial at tizag.gom
- XPath sandbox on this site.

Syntax

The workflow is: import data (directly or from url), query it for some nodes/values with XPath, do {replace}ments if necessary and display the result. For example,

<txp:etc_query
  url="http://textpattern.com"
  query="//div[@id='accoladesBlock']//img"
  replace="@width=128">
    {.} : {@alt?}
</txp:etc_query>

will display all img elements (scaled to width=128) with their alt text found in the div#accoladeBlock on textpattern.com.

The tag can be used in both single or container flavors, the last one supporting <txp:else /> part, evoked if data or query comes up empty:

<txp:etc_query functions="sqrt" query="2*2=sqrt(25)-1">
  That's right!
<txp:else />
  Something is wrong with your formula...
</txp:etc_query>

Attributes

Essential

data: The data to parse. Can be some HTML/XML code, JSON object, serialized array, database query, or merely a string. Defaults to null.
url: The address of the web page to retrieve the data from. If not empty, supersedes the data attribute. Importing external data may be a SECURITY HOLE, use it only with the sites you trust. In db mode, url is the name of a local database. Default is empty (txp own database).
markup: html, xml, db, ldap, json, ini, array, raw or data. Defaults to empty (guess from data).
query: XPath expression, e.g. the path of the reference nodes. Defaults to the data itself, wrapped in some root.
replace: separator-separated list (empty by default) of tokens like xpath=value, see below.

Auxiliary

context: JSON-encoded options array (html headers and so on) to pass to php stream-context-create function when retrieving data from url. Defaults to null.
encoding: the desired output encoding. Default is UTF-8.
separator: symbols to separate the replace tokens. Defaults to ; (semi-colon).
argsep: symbols to separate the arguments of function patterns, see below. Defaults to | (pipe).
globals : comma-separated list of global variables to be used with the special replacement patterns, see below.
populate: a TXP entity (article, image, file or link) fields that will be populated by the data. Default is empty.
sanitize : comma-separated list of globals that should be sanitized with htmlspecialchars. Default is _COOKIE,_GET,_POST,_REQUEST,_SERVER.
specials : comma-separated list of attributes (like data, query, replace or content) to parse for variable patterns. Default is url,data,query.
functions : comma-separated list of php functions to be used in xpath expressions. Requires sufficient user privileges.
parse: whether TXP parser should parse the content before or after the replacements, or both, or not at all. If you don’t expect <txp: tags inside the container, you can set parse="", this will speed up the processing of large data. Default is auto: empty if no <txp: tags are detected, after otherwise.
name : if set, the output will be silently assigned to the corresponding <txp:variable />.
save : if set, the dom tree will be silently saved and can be retrieved later with markup="data".

Presentational

form: the TXP Form with which to parse the output. You may use the plugin as a container instead if you prefer. Default is empty.
sort: comma-separated list of path/to array fields followed by sort direction ([<] or [>]). Default is unset.
limit, offset, wraptag, break, html_id, class, label and labeltag are the usual Textpattern attributes, all empty by default.

Generalities

Typically, etc_query iterates over a list retrieved from data as specified by query. For example,

<txp:etc_query data="<div><a href='#one'>one</a><a href='#two'>two</a></div>" query="//a/@href" />

will result in a list of two attribute nodes: href="#one", href="#two". Would you put something inside the container, it will be parsed and output twice:

<txp:etc_query data="<div><a href='#one'>uno</a><a href='#two'>dos</a></div>" query="//a/@href">
  <span>Hello {#} {?}! Your parent is {..}.</span>
</txp:etc_query>

results in

<span>Hello href #one! Your parent is <a href='#one'>uno</a>.</span>
<span>Hello href #two! Your parent is <a href='#two'>dos</a>.</span>

Special patterns

Every token inside curly {braces} will have a special meaning for etc_query. There are five kinds of patterns, that differ in the symbol following the opening brace.

XPath patterns

The patterns that do not start with ?, !, $, #, %, {, " or space are considered as XPath expressions. They represent the nodes or expressions (relatively to query) you want to display/evaluate. For example, the token {.} matches the current node retrieved from data by query, {.//a} matches all <a> tags that are its descendants, {count(.//a)} counts them, and so on.

If xpath is followed by ?, the node value will be displayed, and # results in the node name. If xpath is an expression (like count), the result is its value. Otherwise, the complete node with its children will be displayed. The tags {?} and {#} represent the context node’s value and name respectively.

Internal patterns {#…}

There are also few internal patterns that can be used in the container and replace: {#row} and {#rows} will return the current row number starting from 1 and the total number of displayed rows. Their cousins {#ind} and {#total} produce the absolute row number (ignoring limit and offset attributes) starting with 0, and the total number of matched nodes. Finally, {##} is the internal node number in XPath patterns.

Literal {%…}, nesting {{…}} and JSON patterns

A pattern {%string} will be considered just like literal string. Useful for outputting curly braces without processing them: {%{I am not {xpath}!}}. This is a little different from the nesting pattern {{I am not {xpath}!}}, where the internal {xpath} gets parsed. The last one will become {I am not parsed_xpath!}, which comes handy for tags nesting.

A JSON object like {"hate":"love","cats":"dogs"} will be recognized and treated as such. If it comes from data, you can access its nodes with query="hate" (will retrieve love) or query="*" (retrieves love and dogs).

Function patterns {$…}

A pattern of the form {$func1(arg1|arg2|...).func2(...)} in the container or replace will call the PHP functions func1(arg1, arg2, ...), and so on. The output of each function can be passed by value as $ to the next one, unless the function name is prefixed by @. In this case $ is passed by reference. If there are no args, then func($) will be returned. For example,

<txp:etc_query data="I hate cats">
  {$json_decode({"hate":"love","cats":"dogs"}).@settype($|array).strtr({?}|$).strtoupper}
</txp:etc_query>

will output “I LOVE DOGS”.

Few functions are predefined: {$date(format)} will transform the node’s value into a date formatted as format. The pattern {$?(if|then|else} will output then if if is not empty, or else otherwise. You can also use comparison (=,<,>) and arithmetic operations +,-,*,/,%, like {$+1} to add 1 to the current node value.

The default argument separator is | (pipe), but you can change it with argsep attribute.

Variable patterns {?…} and {!…}

A pattern {?var1,var2,...|default|function_pattern} will search for the field var1 in global variables from the list globals, then for var2 if var1 is empty, and so on. If nothing is found, the default (if given) will be used. Otherwise, the function_pattern (if set) will be applied to the variable before processing, see above. For example, to assign the value of some integer url GET variable u to <txp:variable name="v" /> if v is not set, use

<txp:etc_query globals="variable,_GET" name="v" data="{?v,u|0|intval}" />

If some of globals is in sanitize list, and no function_pattern is given, the corresponding variables will be htmlspecialchars’ed.

Default for globals is variable, pretext, thisarticle, thisimage, thisfile, thislink, but you can use any global variable, like _SESSION or even _REQUEST (at your own risk).

A pattern {!...} is identical to {?...}, except that it will be reevaluated on each call.

Ranges

A data like [1..10] with markup="json" will be considered as range of integers 1<=n<=10. You can optionally set its step and direction: [10..1:2] is interpreted as the array [10,8,...,2], but [10..1:+2] is empty. For example, this outputs the multiplication table :-)

<txp:etc_query data="[1..9]" break="tr" wraptag="table" parse="before">
  <txp:etc_query data="[1..9]" break="td">
    {{$*{?}}}
  </txp:etc_query>
</txp:etc_query>

Note how nesting patterns and parse attribute work: the internal etc_query will produce

<td>{$*1}</td><td>{$*2}</td>...<td>{$*9}</td>

that will be parsed by the external etc_query.

Replace attribute

The purpose of replace attribute is to replace/insert/move/delete some nodes in the data DOM tree. There are several types of replace tokens.

The first type is just xpath, which removes all the nodes matched by xpath. For example, replace="//script|//@*[starts-with(name(),'on')]" will attempt to avoid javascript injection.
Next, a couple xpath=value gives to the nodes matched by xpath the value (possibly empty). The value will be parsed into XML tree, unless xpath is followed by ?.
If xpath is followed by +, the value will be appended as child to each matched node.
If xpath is followed by ^ (respectively $), the value will be inserted before (after) each matched node.
A token xpath&=value will replace the matched node by value.
The tokens xpath~=xpath and xpath/=xpath will move the left-hand node, inserting it before the right-hand one, or appending as its child respectively. For example, //a[1]/=.. will move every first link to the end of its parent node.
A token xpath@@attr1=val1@attr2=val2... will set the attributes attr1,attr2,... of matched elements to the respective values. You can use the shortcuts ., ?, +, ^, $, &, ~ and / as attributes too.

You can also do replacements inside the container and nest the special tokens. For example,

<txp:etc_query data="{?body}" query="//p">
  {+=<span class="row">{#row}</span>}
</txp:etc_query>

will number and display all the paragraphs of the article.

Examples

Prefetching and TOC

This article form will produce a table of contents in the individual article view, and preload the images found in articles’ bodies in list view:

<txp:if_individual_article>
  <txp:etc_query data="{?body}"
    query="//h2|//h3|//h4|//h5"
    label="Table of Contents" labeltag="h1" />
  <txp:body />
<txp:else />
  <txp:etc_query data="{?body}" query="//img">
    <link rel="prefetch" href="{@src?}" />
  </txp:etc_query>
  <txp:excerpt />
</txp:if_individual_article>

Customizing hard-coded txp tags

<txp:etc_query data='<txp:comment_message_input />'
  query="//*[@id='message']"
  replace="@@required=required@placeholder=Drop us a word" />

Advancing the value of some TXP variable

<txp:etc_query globals="variable" name="var" query="{?var|0}+1" />

Importing RSS feeds

<txp:etc_query
  url="http://forum.textpattern.com/extern.php?type=RSS&action=new&fid=79"
  markup="xml"
  query="//channel/item[position()<6]"
  wraptag="dl"
  label="News from Textpattern Forum" labeltag="h4">
    <dt><a href="{link?}">{title?}</a></dt>
    <dd>{description?}</dd>
</txp:etc_query>

Detecting AJAX calls

<txp:etc_query data="{?HTTP_X_REQUESTED_WITH}" globals="_SERVER">
  AJAX
<txp:else />
  Not AJAX
<txp:etc_query>

History

Version 0.1: proof of concept, not publicly released.
Version 0.2: first release.
Version 0.3: fixed encoding.
Version 0.4: added remove and separator attributes.
Version 0.41: can also be used as single tag.
Version 0.5: fixed html markup, enhanced replace attribute.
Version 0.6: removed remove (merged with replace), introduced special patterns. The name is changed to etc_query.
Version 0.7: added functions, limit and offset, enhanced replace attribute.
Version 0.8: Now can also query databases and arrays/objects.
Version 0.82: introduced specials attribute.
Version 0.9: introduced parse and sanitize attributes.
Version 0.97: more robust parser, at the price of small syntax changes.
Version 0.98: function patterns respect php scripting preferences / user levels.
Version 1.0: enhanced replace attribute, enabled by-reference passing in functions.
Version 1.1: even more enhanced replace attribute, dropped delim attribute.
Version 1.2: added XSL support, context and save attribute.
Version 1.3: added populate attribute, more customizable sanitize.
Version 1.4: added sort attribute and ini markup, among other changes.
Version 1.5: chainable markup allows for custom data formats. Dropped pre-4.6 compatibility.

File(s)

File: etc_query.txt [17.62 kB] (4952 downloads, ~29 per month)

Alessandro Pedraglio · 30 August 2023, 08:35 — Reply

Good stuff. More information or examples how to exctract information from JSON or XML objects would be great. Some examples are difficult to understand, like Beethoven explaining his own sonatas.

Oleg · 30 August 2023, 13:39 — Reply

I could only be compared to Beethoven via hearing loss, but thank you.

To understand how etc_query extracts XML data, you need a basic knowledge of XPath. Other formats (JSON etc) just mimic a limited XPath syntax.

I find easier to learn/explain by examples, so don’t hesitate asking more specific questions here or on the forum.