App/Xtn/Mediawiki/Tidy/JTidy
From XOWA: the free, open-source, offline wiki application
Contents
Source
The jtidy_xowa.jar was built using the source at https://sourceforge.net/projects/jtidy/files/JTidy/r938/.
Its source is not currently included with XOWA. It is available at the following location: https://sourceforge.net/projects/xowa/files/support/jtidy/
Modifications
The jtidy_xowa.jar was created for the following reasons:
- JTidy is not completely in sync with tidy:
- JTidy appears to have been built off an earlier version of tidy. tidy has since made a number of bug fixes that are not in JTidy
- JTidy has significant differences in translating tidy
- JTidy is a very close translation of tidy, but deviates from tidy in a number of places.
jtidy_xowa changes
The following is only a partial list of JTidy changes. Multiple changes were made for v1.6.2.1 of XOWA to have JTidy be more "tidy-like". In addition, more changes will probably occur in the future to close the gap in source code between tidy and JTidy.
ParseBlock should handle exiled variable during element reparenting
- purpose: <div> between <table> and <tr> not reparented correctly;
- example: fa.wikinews.org/wiki/Main_Page -> invalid table layout
- file: /jtidy-r938/src/main/java/org/w3c/tidy/ParserImpl.java
- proc: ParseBlock.Parse
- add:
else if ((node.tag.model & Dict.CM_TABLE) != 0 || (node.tag.model & Dict.CM_ROW) != 0)
{
// XOWA: DATE:2014-05-31
/* http://tidy.sf.net/issue/1316307 */
/* In exiled mode, return so table processing can
continue. */
if (lexer.exiled)
return;
Do not trim empty block element if it has attributes
- purpose: empty block elements should not be trimmed if they have attributes
- example: ko.wikisource.org/wiki/Main_Page -> invalid table layout
- file: /jtidy-r938/src/main/java/org/w3c/tidy/Lexer.java
- proc: canPrune
- add:
// XOWA: added to match tidy; DATE:2014-05-31
if ( ((element.tag.model & Dict.CM_BLOCK) != 0) && element.attributes != null)
return false;
if (element.tag == this.configuration.tt.tagA && element.attributes != null)
Do not convert empty <p> to <br>
- purpose: commented code to convert empty <p> to <br> because it is not in tidy
- example: none
- file: /jtidy-r938/src/main/java/org/w3c/tidy/Node.java
- proc: trimEmptyElement
- code:
// XOWA: DELETED: not in tidy, and don't really agree with intent; DATE:2014-05-31
// else if (element.tag == tt.tagP && element.content == null)
// {
// // replace <p></p> by <br><br> to preserve formatting
// Node node = lexer.inferredTag("br");
// Node.coerceNode(lexer, element, tt.tagBr);
// element.insertNodeAfterElement(node);
// }
Do not add \n after <span> in <pre>
- purpose: JTidy was incorrectly adding \n to all block elements inside pre
- example: none
- file: /jtidy-r938/src/main/java/org/w3c/tidy/PPrint.java
- proc: printTag
- code:
if (indent + linelen < this.configuration.wraplen)
{
// wrap after start tag if is <br/> or if it's not inline
// fix for [514348]
if (!TidyUtils.toBoolean(mode & NOWRAP)
&& (!TidyUtils.toBoolean(node.tag.model & Dict.CM_INLINE) || (node.tag == tt.tagBr))
&& afterSpace(node))
{
wraphere = linelen;
}
}
// XOWA: DATE:2014-06-01
/* flush the current buffer only if it is known to be safe,
i.e. it will not introduce some spurious white spaces.
See bug #996484 */
else if ( TidyUtils.toBoolean(mode & NOWRAP)
|| node.tag == tt.tagBr
|| afterSpace(node)
)
{
condFlushLine(fout, indent);
}