Description
This multipurpose Textpattern plugin allows you to retrieve data, transform it and display the result the way you like. It can query HTML/XML/JSON documents, databases or global variables, and can be useful for
- importing data from external sites/feeds
- querying databases
- retrieving and modifying values of txp or url variables
- performing all kind of calculations
- iterating over arrays/ranges
- customizing the output of Textpattern tags and other plugins
- creating tables of contents, navigation links, image prefetching, pagination, conditional structures and much more.
It is in many points inspired by smd_xml and smd_query, but uses XPath to access the desired nodes.
Requirements
- PHP 5.3 with DOM extension (enabled by default)
- Basic knowledge of XPath. Here are some links kindly gathered by Julián :
- XPath tutorial at zvon.org
- XPath reference at zvon.org
- XPath test lab at zvon.org
- XPath tutorial at tizag.gom
- XPath sandbox on this site.
Syntax
The workflow is: import data (directly or from url), query it for some nodes/values with XPath, do {replace}ments if necessary and display the result. For example,
<txp:etc_query
url="http://textpattern.com"
query="//div[@id='accoladesBlock']//img"
replace="@width=128">
{.} : {@alt?}
</txp:etc_query>
will display all img
elements (scaled to width=128
) with their alt
text found in the div#accoladeBlock
on textpattern.com.
The tag can be used in both single or container flavors, the last one supporting <txp:else />
part, evoked if data or query comes up empty:
<txp:etc_query functions="sqrt" query="2*2=sqrt(25)-1">
That's right!
<txp:else />
Something is wrong with your formula...
</txp:etc_query>
Attributes
Essential
- data: The data to parse. Can be some HTML/XML code, JSON object, serialized array, database query, or merely a string. Defaults to
null
. - url: The address of the web page to retrieve the data from. If not empty, supersedes the data attribute. Importing external data may be a SECURITY HOLE, use it only with the sites you trust. In
db
mode, url is the name of a local database. Default is empty (txp own database). - markup:
html
,xml
,db
,ldap
,json
,ini
,array
,raw
ordata
. Defaults to empty (guess from data). - query: XPath expression, e.g. the path of the reference nodes. Defaults to the data itself, wrapped in some root.
- replace: separator-separated list (empty by default) of tokens like
xpath=value
, see below.
Auxiliary
- context: JSON-encoded
options
array (html headers and so on) to pass to php stream-context-create function when retrieving data from url. Defaults tonull
. - encoding: the desired output encoding. Default is
UTF-8
. - separator: symbols to separate the replace tokens. Defaults to
;
(semi-colon). - argsep: symbols to separate the arguments of function patterns, see below. Defaults to
|
(pipe). - globals : comma-separated list of global variables to be used with the special replacement patterns, see below.
- populate: a TXP entity (article, image, file or link) fields that will be populated by the data. Default is empty.
- sanitize : comma-separated list of globals that should be sanitized with
htmlspecialchars
. Default is_COOKIE,_GET,_POST,_REQUEST,_SERVER
. - specials : comma-separated list of attributes (like
data
,query
,replace
orcontent
) to parse for variable patterns. Default isurl,data,query
. - functions : comma-separated list of php functions to be used in xpath expressions. Requires sufficient user privileges.
- parse: whether TXP parser should parse the content
before
orafter
the replacements, orboth
, or not at all. If you don’t expect<txp:
tags inside the container, you can setparse=""
, this will speed up the processing of large data. Default isauto
: empty if no<txp:
tags are detected,after
otherwise. - name : if set, the output will be silently assigned to the corresponding
<txp:variable />
. - save : if set, the dom tree will be silently saved and can be retrieved later with
markup="data"
.
Presentational
- form: the TXP Form with which to parse the output. You may use the plugin as a container instead if you prefer. Default is empty.
- sort: comma-separated list of
path/to
array fields followed by sort direction ([<]
or[>]
). Default is unset. - limit, offset, wraptag, break, html_id, class, label and labeltag are the usual Textpattern attributes, all empty by default.
Generalities
Typically, etc_query
iterates over a list retrieved from data as specified by query. For example,
<txp:etc_query data="<div><a href='#one'>one</a><a href='#two'>two</a></div>" query="//a/@href" />
will result in a list of two attribute nodes: href="#one", href="#two"
. Would you put something inside the container, it will be parsed and output twice:
<txp:etc_query data="<div><a href='#one'>uno</a><a href='#two'>dos</a></div>" query="//a/@href">
<span>Hello {#} {?}! Your parent is {..}.</span>
</txp:etc_query>
results in
<span>Hello href #one! Your parent is <a href='#one'>uno</a>.</span>
<span>Hello href #two! Your parent is <a href='#two'>dos</a>.</span>
Special patterns
Every token inside curly {braces}
will have a special meaning for etc_query
. There are five kinds of patterns, that differ in the symbol following the opening brace.
XPath patterns
The patterns that do not start with ?
, !
, $
, #
, %
, {
, "
or space are considered as XPath expressions. They represent the nodes or expressions (relatively to query) you want to display/evaluate. For example, the token {.}
matches the current node retrieved from data by query, {.//a}
matches all <a>
tags that are its descendants, {count(.//a)}
counts them, and so on.
If xpath
is followed by ?
, the node value will be displayed, and #
results in the node name. If xpath
is an expression (like count
), the result is its value. Otherwise, the complete node with its children will be displayed. The tags {?}
and {#}
represent the context node’s value and name respectively.
Internal patterns {#…}
There are also few internal patterns that can be used in the container and replace: {#row}
and {#rows}
will return the current row number starting from 1 and the total number of displayed rows. Their cousins {#ind}
and {#total}
produce the absolute row number (ignoring limit and offset attributes) starting with 0, and the total number of matched nodes. Finally, {##}
is the internal node number in XPath patterns.
Literal {%…}, nesting {{…}} and JSON patterns
A pattern {%string}
will be considered just like literal string
. Useful for outputting curly braces without processing them: {%{I am not {xpath}!}}
. This is a little different from the nesting pattern {{I am not {xpath}!}}
, where the internal {xpath}
gets parsed. The last one will become {I am not parsed_xpath!}
, which comes handy for tags nesting.
A JSON object like {"hate":"love","cats":"dogs"}
will be recognized and treated as such. If it comes from data, you can access its nodes with query="hate"
(will retrieve love) or query="*"
(retrieves love and dogs).
Function patterns {$…}
A pattern of the form {$func1(arg1|arg2|...).func2(...)}
in the container or replace will call the PHP functions func1(arg1, arg2, ...)
, and so on. The output of each function can be passed by value as $
to the next one, unless the function name is prefixed by @
. In this case $
is passed by reference. If there are no args, then func($)
will be returned. For example,
<txp:etc_query data="I hate cats">
{$json_decode({"hate":"love","cats":"dogs"}).@settype($|array).strtr({?}|$).strtoupper}
</txp:etc_query>
will output “I LOVE DOGS”.
Few functions are predefined: {$date(format)}
will transform the node’s value into a date formatted as format
. The pattern {$?(if|then|else}
will output then
if if
is not empty, or else
otherwise. You can also use comparison (=,<,>
) and arithmetic operations +,-,*,/,%
, like {$+1}
to add 1 to the current node value.
The default argument separator is |
(pipe), but you can change it with argsep attribute.
Variable patterns {?…} and {!…}
A pattern {?var1,var2,...|default|function_pattern}
will search for the field var1
in global variables from the list globals, then for var2
if var1
is empty, and so on. If nothing is found, the default
(if given) will be used. Otherwise, the function_pattern
(if set) will be applied to the variable before processing, see above. For example, to assign the value of some integer url GET variable u
to <txp:variable name="v" />
if v
is not set, use
<txp:etc_query globals="variable,_GET" name="v" data="{?v,u|0|intval}" />
If some of globals is in sanitize list, and no function_pattern
is given, the corresponding variables will be htmlspecialchars
’ed.
Default for globals is variable, pretext, thisarticle, thisimage, thisfile, thislink
, but you can use any global variable, like _SESSION
or even _REQUEST
(at your own risk).
A pattern {!...}
is identical to {?...}
, except that it will be reevaluated on each call.
Ranges
A data like [1..10]
with markup="json"
will be considered as range of integers 1<=n<=10
. You can optionally set its step and direction: [10..1:2]
is interpreted as the array [10,8,...,2]
, but [10..1:+2]
is empty. For example, this outputs the multiplication table :-)
<txp:etc_query data="[1..9]" break="tr" wraptag="table" parse="before">
<txp:etc_query data="[1..9]" break="td">
{{$*{?}}}
</txp:etc_query>
</txp:etc_query>
Note how nesting patterns and parse
attribute work: the internal etc_query
will produce
<td>{$*1}</td><td>{$*2}</td>...<td>{$*9}</td>
that will be parsed by the external etc_query
.
Replace attribute
The purpose of replace attribute is to replace/insert/move/delete some nodes in the data DOM tree. There are several types of replace tokens.
- The first type is just
xpath
, which removes all the nodes matched byxpath
. For example,replace="//script|//@*[starts-with(name(),'on')]"
will attempt to avoid javascript injection. - Next, a couple
xpath=value
gives to the nodes matched byxpath
thevalue
(possibly empty). Thevalue
will be parsed into XML tree, unlessxpath
is followed by?
. - If
xpath
is followed by+
, thevalue
will be appended as child to each matched node. - If
xpath
is followed by^
(respectively$
), thevalue
will be inserted before (after) each matched node. - A token
xpath&=value
will replace the matched node byvalue
. - The tokens
xpath~=xpath
andxpath/=xpath
will move the left-hand node, inserting it before the right-hand one, or appending as its child respectively. For example,//a[1]/=..
will move every first link to the end of its parent node. - A token
xpath@@attr1=val1@attr2=val2...
will set the attributesattr1,attr2,...
of matched elements to the respective values. You can use the shortcuts.
,?
,+
,^
,$
,&
,~
and/
as attributes too.
You can also do replacements inside the container and nest the special tokens. For example,
<txp:etc_query data="{?body}" query="//p">
{+=<span class="row">{#row}</span>}
</txp:etc_query>
will number and display all the paragraphs of the article.
Examples
Prefetching and TOC
This article form will produce a table of contents in the individual article view, and preload the images found in articles’ bodies in list view:
<txp:if_individual_article>
<txp:etc_query data="{?body}"
query="//h2|//h3|//h4|//h5"
label="Table of Contents" labeltag="h1" />
<txp:body />
<txp:else />
<txp:etc_query data="{?body}" query="//img">
<link rel="prefetch" href="{@src?}" />
</txp:etc_query>
<txp:excerpt />
</txp:if_individual_article>
Customizing hard-coded txp tags
<txp:etc_query data='<txp:comment_message_input />'
query="//*[@id='message']"
replace="@@required=required@placeholder=Drop us a word" />
Advancing the value of some TXP variable
<txp:etc_query globals="variable" name="var" query="{?var|0}+1" />
Importing RSS feeds
<txp:etc_query
url="http://forum.textpattern.com/extern.php?type=RSS&action=new&fid=79"
markup="xml"
query="//channel/item[position()<6]"
wraptag="dl"
label="News from Textpattern Forum" labeltag="h4">
<dt><a href="{link?}">{title?}</a></dt>
<dd>{description?}</dd>
</txp:etc_query>
Detecting AJAX calls
<txp:etc_query data="{?HTTP_X_REQUESTED_WITH}" globals="_SERVER">
AJAX
<txp:else />
Not AJAX
<txp:etc_query>
History
- Version 0.1: proof of concept, not publicly released.
- Version 0.2: first release.
- Version 0.3: fixed encoding.
- Version 0.4: added remove and separator attributes.
- Version 0.41: can also be used as single tag.
- Version 0.5: fixed html markup, enhanced replace attribute.
- Version 0.6: removed remove (merged with replace), introduced special patterns. The name is changed to
etc_query
. - Version 0.7: added functions, limit and offset, enhanced replace attribute.
- Version 0.8: Now can also query databases and arrays/objects.
- Version 0.82: introduced specials attribute.
- Version 0.9: introduced parse and sanitize attributes.
- Version 0.97: more robust parser, at the price of small syntax changes.
- Version 0.98: function patterns respect php scripting preferences / user levels.
- Version 1.0: enhanced replace attribute, enabled by-reference passing in functions.
- Version 1.1: even more enhanced replace attribute, dropped delim attribute.
- Version 1.2: added XSL support, context and save attribute.
- Version 1.3: added populate attribute, more customizable sanitize.
- Version 1.4: added sort attribute and
ini
markup, among other changes. - Version 1.5: chainable
markup
allows for custom data formats. Dropped pre-4.6 compatibility.
File(s)
- File: etc_query.txt [60.72 kB] (4273 downloads, ~29 per month)