App/Category/Internals

From XOWA: the free, open-source, offline wiki application

This page will document some of the internals of V2

Builder commands

For reference, this is the current script to set up the V2 Category system

app.bldr.pause_at_end_('n');
app.bldr.cmds
.add_many('simple.wikipedia.org', 'ctg.hiddencat_sql', 'ctg.hiddencat_ttl', 'ctg.link_sql', 'ctg.link_idx').owner
;
app.bldr.run;

Note that 'ctg.link_sql' and 'ctg.link_idx' are required.

Note that 'ctg.hiddencat_sql' and 'ctg.hiddencat_ttl' can be omitted. However, it is recommended that they be run (for English Wikipedia, it adds less than 5 minutes to the entire process).

ctg.hiddencat_sql

  • This command will look for a file matching *page_props.sql in the wiki directory
For example: /xowa/wiki/simple.wikipedia.org/simplewiki-latest-page_props.sql. Note this sql will have a format of (page_id, prop_name, prop_val)
  • It will then parse the .sql file and look for entries having a prop_name of "hiddencat". For example (1, 'hiddencat', '')
  • When it's done, it will generate a Base85 encoded list of all page_ids
The output directory will be /xowa/wiki/simple.wikipedia.org/tmp/ctg.hiddencat_sql/make/
An example of a file would be:
!!!!#
!!!!$

ctg.hiddencat_ttl

  • This command will look at the output of ctg.hiddencat_sql and find the appropriate title for the given id
This step is necessary as the category indexes are sorted by title, not by id.
  • When it's done, it will generate a sorted list of title|id.
The output directory will be /xowa/wiki/simple.wikipedia.org/tmp/ctg.hiddencat_ttl/make/
An example of a file would be:
A|!!!!#
B|!!!!$

ctg.link_sql

  • This command will look for a file matching *categorylinks.sql in the wiki directory
For example: /xowa/wiki/simple.wikipedia.org/simplewiki-latest-categorylinks.sql.
  • It will then parse the .sql file and extract the following data: category_name, page_id, page_member_type, page_sortkey, page_member_add_date
  • When it's done, it will generate a sorted list of category|type|sortkey|id|date.
The output directory will be /xowa/wiki/simple.wikipedia.org/tmp/ctg.link_sql/make/
An example of a file would be:
A|p|Page_1_sortkey|!!!!%|!!!@!|
B|p|Page_2_sortkey|!!!!^|!!!@@|

ctg.link_idx

  • This command will generate the /category2/ hive based on the output of the above commands. It uses the following:
    • Category link data as built in /xowa/wiki/simple.wikipedia.org/tmp/ctg.link_sql/make/.
    • Category hidden data as built in /xowa/wiki/simple.wikipedia.org/tmp/ctg.hiddencat_ttl/make/.
  • It will then merge the output of the above data and generate the /main/ and /link/ sudirectories in /category2/

/category2/

/main/

The main files are located at /xowa/wiki/simple.wikipedia.org/site/category2/main/. They follow the same hive structure as the other directories (a main reg.csv and subdirectories of the format of /00/00/00/00/0123456789.xdat)

Each file contains header information for a category. Presently, this includes the following:

  • Category name
  • Hidden: "y" means hidden; "n" means not hidden
  • Number of subcategories (Base85 encoded)
  • Number of files (Base85 encoded)
  • Number of pages (Base85 encoded)
EX: A|y|!!!!!|!!!!!|!!!!!|

/link/

The link files are located at /xowa/wiki/simple.wikipedia.org/site/category2/link/. They also follow the same hive structure as the other directories.

Each file contains members of a category. Presently, this includes the following:

  • Category name
  • Length of subcategories data
  • Length of files data
  • Length of pages data
  • A series of entries listing category members
    • Note that these entries are broken into subgroups (subcategories / files / pages) depending on the preceding lengths.
    • Each entry is in a semi-colon delimited format
      • page_id (Base85 encoded)
      • page_member_add_date (Base85 encoded)
      • page_sortkey
EX (for entry): |!!!!%;!!!@!;Page_1_sortkey|
EX (for all): A|!!!!!|!!!!!|!!!!X|!!!!%;!!!@!;Page_1_sortkey|!!!!^;!!!@@;Page_2_sortkey|


Namespaces

XOWA

Getting started

Android

Help

Blog

Donate