Help / Information


Frequently Asked Questions (FAQ)

Here are some questions which are commonly asked about Lemon8-XML. Please also have a look at the Lemon8-XML presentation from the First International PKP Scholarly Publishing Conference.

  1. Q: is Lemon8-XML Open Source?
    A: yes, it is licensed under the GNU GPL v2 Open Source license.
  2. Q: is there a planned release date?
    A: Lemon8-XML is presently in public beta release. For updates and announcements, see the PKP Lemon8 project page.
  3. Q: what technology is Lemon8-XML based on?
    A: Have a look at the Technology Used section below.
  4. Q: can it convert documents automatically from Word/ODT to XML and if so, what sort of heuristics are used?
    A: yes, automatic, hands-free conversion is what Lemon8-XML is designed for. The approach is loosely based on looking for visual "markers" within a document: e.g. a section title which is larger than the surrounding text and bolded, a list of references at the end of the document, a caption immediately before or after an embedded figure, etc. Although the parsers are far from perfect, they have been developed over dozens of scholarly articles and are being improved constantly.
  5. Q: does it rely on authors choosing styles?
    A: not at this point, although we are discussing a collaboration with the ICE-RS research group to share their experience with developing style templates and tools to apply them. A few workflows have been proposed that could improve the quality of information used by both Lemon8-XML and ICE-RS.
  6. Q: can it interoperate with other XML languages? eg. MathML
    A: not at the moment, only a subset of the National Library of Medicine Journal Publishing DTD is supported. However, because the semantic structure stored in Lemon8-XML is not tied to any particular schema, it is possible to write output XSL transformations to any XML schema desired. We are planning to include support for embedded MathML, SVG, etc. in the future.
  7. Q: is it only intended for medical and scientific journals?
    A: absolutely not. Both Lemon8-XML and the NLM publishing DTD are intended to be used by journals from all disciplines and varying layout complexity. While it still shows some of its STM heritage, we intend to improve support for alternative and creative forms of formatting.
  8. Q: what are the "substantial installation requirements" you describe?
    A: unlike the rest of the PKP suite, Lemon8-XML doesn't have an easy installer interface; as well, the Java components (notably Apache FOP) require some admin experience to set up correctly. In addition, to convert documents from MS-Word or similar to OpenOffice ODT automatically, Docvert must be installed, which is fairly complicated to install and configure.
  9. Q: is this a collaborative project - e.g. will it be on sourceforge?
    A: this is most definitely a collaborative project, although we will continue to host the documentation, discussion, and source code on the PKP website for the forseeable future. Although we are carefully managing the release process, we are constantly intersted in hearing from others willing to contribute to development or testing of Lemon8-XML.
  10. Q: is the material on your site (e.g. the presentation) open (eg. CC)?
    A: as with most scholarly material, the documentation and information presented regarding Lemon8-XML is free to distribute and use as you wish, as long as you provide full attribution to the Public Knowledge Project. The equivalent CC license would be the Creative Commons Attribution 2.5 Canada License.

Five Steps to an XML Document

In this overview, we'll show you how to move through the steps involved in converting a scholarly article in MS-Word format to XML.

1) Pre-Editing and Format Conversion

The first thing you should do is ensure that your document is properly edited to have the format of a "typical" scholarly article: front matter such as title, author names, etc. followed by body matter with headings (distinguised by their formatting), followed by a list of references (numbered or unnumbered), appendices, etc. You can also include tables and images (figures) anywhere within the document text.

Lemon8-XML can handle documents of any word-processing format (eg. Word .DOC, Word .XML, Rich Text .RTF, Wordperfect .WPD, DocBook, HTML, etc.); however, it works best with the OpenDocument format for a number of reasons. Proprietary document material such as Microsoft WordArt, Refman/EndNote OLE references, etc. may not be converted properly (or at all) due to technical and licensing reasons. Articles uploaded in other formats will automatically be converted.

The exact layout isn't crucial, but the more effort that is put into formatting your document initially, the better Lemon8-XML will be able to do its job.

2) Editing Document Metadata

Article metadata, or "front matter" as it is often called, may exist in a myriad of formats. As a result, it is exceptionally difficult to automatically extract metadata with a high degree of accuracy. Lemon8-XML will do its best to correctly identify and extract your document's metadata, but it will likely need some correction.

It is in your best interest to ensure that the article metadata is as complete and accurate as possible, as this is the information that will be used for finding your article in bibliographic indexes and repositories.

The metadata editor allows you to edit, correct, add, and remove metadata in a structured way that is quick and easy to do. You can edit the ordered list of authors, provide a list of appropriate affiliated institutions and contact information, provide keywords, standardized article IDs, and edit full-text information such as the abstract, conflicts of interest, etc.

For articles published using Open Journal Systems, the Lemon8-XML OJS plugin will automatically pre-populate the article metadata with the information contained within OJS during the article submission.

3) Ordering and Renaming Sections

Lemon8-XML will automatically detect the sections, tables, and figures within your document, and provides an overview of their order and hierarchy (in the case of sub-headings, etc). The section editor allows you to re-order and re-organize your document as you desire. eg. in the case that figures are embedded as appendices, you may reorder them to be within sections, or even upload new images to replace or supplement previous ones.

Although the functionality is limited to a very general preview for now, we are aiming to provide comprehensive section and table editing features in the future. For now, however, the word count and overview should give you an idea of how your article is structured, and allow you to ensure that all of the pertinent sections are in the right places.

4) Editing and Correcting Citations

As any editor or author knows, editing citations (references) to ensure that they're complete and correct is extremely tedious and time-consuming work. Lemon8-XML makes that a thing of the past! When your document is uploaded, the references are automatically detected, parsed, and compared with several online services to ensure their completeness and correctness.

For each, you will be shown a visual indication about the citation's completeness: a red 'X' indicates that it will have to be edited manually; a blue '!' indicates that manual editing may be required, and a green check-mark indicates that it was correctly parsed or corrected ("looked up"). At a glance, you will be able to determine which citations require editing, thus minimizing your work.

Each citation is passed through 4 parsing mechanisms, matching over 400 citation styles, and assigned an accuracy score. The Lemon8-XML administrator can set the threshold for "correctness" to adjust the sensitivity of the parsers. Likewise, each citation is matched against 2 online databases (at present; more are planned shortly) to verify completeness. If a match is found, a correction score is assigned that shows how close the original citation was to the online index. Again, only those that meet the threshold are actually corrected.

When manually editing citations, the editor allows you to either fill in fields individually, or edit the complete citation and re-parse it as you like. If you feel the citation is correct and want to see if it can be enhanced by information from an online index, click "Lookup citation" to attempt an automatic correction.

You should fill out as many fields as possible, as accurately and completely as possible, to ensure both the highest quality of export from your document, as well as the best likelihood that your citation can be automatically looked up. Again, although this may take a little time, it will ensure the best quality for your article. We have great plans for a more dynamic, AJAX-based citation editor in the near future.

5) Previewing and Exporting the XML

The final step in editing your document is previewing it in HTML and PDF formats and ensuring that it renders as you would expect it to appear in an online journal or institutional archive.

This should be a good opportunity to double-check that the editing you've made on the article metadata, section ordering, and citations is all reflected correctly in the final preview. All of the previews are generated dynamically, so any iterative changes you make will be immediately reflected.

Once you are content that your article is complete, you can export it in fully-structured XML format. At the moment, only the NLM Journal Publishing DTD is supported, although we are planning a number of alternate export formats (including LaTeX, Docbook, and ODT) in the near future. Lemon8-XML has been developed in close relation with Pubmed Central and strives to meet their stringent standards for archival quality XML.


Technology Used

Lemon8-XML has been developed using 100% free and Open Source software and technology.

Lemon8-XML is built on the flexible CakePHP framework (1.2.0.6311 beta).
Many of the advanced language features and libraries come from PHP5 (5.2.4).
The webserver and database are provided by Apache (2.2.6) and MySQL (5.0.48).
Document conversion and creation is handled using Docvert (3.3).
Actual document format conversion is done using OpenOffice.org (2.3).
The citation parser incorporates the ParaTools (1.10) parsing algorithms.
PDF preview rendering is done with XSL-FO via Apache FOP (0.94).
Book citation correction uses dynamic searching in the ISBNdb database.
Journal citation correction uses the Entrez eUtils for dynamically searching PubMed.