
Integration of different data sources
Data do not require processing for the publishing process, DocScape takes care of this job
automatically
DocScape Publisher, the main component of the DocScape system, receives a
structured
XML dataset file as input (if required spread modularly over several
interconnected XML files), from which a PDF file is generated.
The
DocScape Data Extractor component integrates a wide range of data sources into
this dataset file. It is configured through an
XML formula for the dataset file, in which - next to the data structure - the
d
ata sources and additional aggregation and structuring rules are defined
through special DocScape annotations. By linking various data sources through unique
key criteria, the document represents an integrative view of a wide range of available
data sources.
A number of scenarios:
- Generating data via a leading
ERP system. Grouping into product groups, product families and catalog chapters through
predefined grouping keys. Linked via the product code, product description texts (RTF) and figures
(image format) are integrated as files.
- Generating data via a leading
PIM system. Editorial contents relating to product families (introduction, project
illustrations as eye-catchers) are stored in the CMS and linked through the family key.
- Generating data via a leading
CMS system. Contents up to product group level (incl. text descriptions and product
illustrations) are administered and structured, and the sequence of contents is defined for the
catalog. Product data, technical features and price information are, linked through the product
code, taken directly from the inventory management system.
- Linking to
output interface of ERP system (e.g. for offers). Output files from the ERP system (e.g.
RDI) are converted into XML, linked through the product code, mixed with data from the PIM system
and published, e.g. as
illustrated offers, under application of the full catalog regulations.
Access to all data sources, conversion into XML (e.g. RDI, RTF, XLS) if required, set-up and
standardization of XML structures and the consolidation of all contents in one joint dataset
structure is performed 100% automatically by
DocScape Data Extractor.
For the realization of its DocScape Data Extractor, QuinScape relies on standard technologies
in JAVA and XSLT, which guarantee long-term portability and maintainability.
Automatic data compression and aggregation
In addition to the definition of data sources and structuring criteria for data extraction
from a variety of sources, the
XML formula allows the definition of
compression and aggregation rules for the dataset file, which enables the grouping and
summarizing of data according to a selection of criteria.
Possible applications:
- Grouping of consecutive products with identical product photo into
product groups.
- Collating, linking and summarizing of
accessory products.
- Generating of
symbol lists.
- Compilation of detailed images for generated diagrams (such as explosion drawings).
-
Inclusion/exclusion rule: should a product feature be displayed generally for the whole
product group and only exceptions be listed, or should it be featured at product level?
By selecting the relevant structuring, compression and aggregation rules, very different
document structures may be generated from the same dataset, such as
- Main catalogs, summarized catalogs, price lists;
- Specialized category catalogs (featuring a carefully selected range of information, such as
detailed images, explosion drawings, generated diagrams, feature tables);
- Value-added offers for premium customers;
- Personalized catalogs/brochures: emphasized representations of products which - based on the
customer profile - may be of interest to a specific customer.
Thanks to formula-based data modeling, the rule-based translation of these aggregation and
compression tasks is no problem for
DocScape Data Extractor.
Other compression rules are subject to the available space or other layout-dependent criteria
(symmetry of double spreads, chapter structuring, spread optimization) and cannot be applied before
the layout is generated:
- Display of product features by individual products or compilation in a table at product group
level.
- Selection of product images from the total volume of available images (with different
sizes/shapes).
- Omission of less significant product features to save space.
- Selection of space-saving or more complex layouts for premium products (in cases of multi-level
premium classification) to optimize space utilization.
- Aggregation of product texts.
- Table structuring.
If a compression or aggregation rule contains a possible reference to the layout regulations, it
should not be mapped in the
DocScape Data Extractor regulations, but in the
DocScape Publisher regulations. The realization of compression and aggregation
rules in
DocScape Data Extractor is much more efficient, but does not include interaction
with the
layout engine.
Inclusion of external documents
Not all contents of a document have to be generated 100% data-based. Manually generated
contents may be integrated in several ways:
- Generation of manually designed pages with a DTP program, storage as a PDF file, integration
through DocScape.
Pagination, column titles etc. may be added by DocScape if required, as well as the accurate
positioning on left/right pages. Multi-page documents are integrated into a generated document as a
sequence of pages. If different languages or other versions are included in the DocScape-generated
document as
PDF levels, external documents featuring several levels may also be integrated accurately
into the levels of the generated document.
- Generation of page sections (such as advertisements, eye-catchers or other contents featuring
components which are not available fully structured in the database) with a DTP program, storage as
cropped PDF, integration through DocScape.
At every level of the document structure, a manually designed content may be added or replace
a basically data-based generated content. DocScape’s rule-based approach takes care of its
positioning on the page. An external PDF document with contents which do not fill a full page may
be separated into several “PDF pages”, which are positioned individually and distributed on the
actual document pages, guaranteeing a thoroughly optimized layout.
Integration of structured text contents
If no content management system is integrated, the recording of text contents for print
publishing must be planned carefully: on the one hand certain formats, such as accentuations,
headlines, lists and - if required - tables, must be supported, and on the other hand,
media-independent recording is desirable to support the multiple application of a text
content in different font sizes, text widths and layout designs. Not all file input interfaces
allow the recording of structured texts for text input fields, but most CMS systems provide this
option.
Several options should be considered for their application with DocScape:
-
Integration as HTML
Editors which support a formatted recording of text contents in HTML are available for the
integration into web-based data administration surfaces. It is one of the system’s advantages that
almost any content from other applications may be integrated via the clipboard function. In terms
of media-independence, there are a number of HTML attributes (switching of font or color, defined
width of table columns) which this kind of text should not feature. DocScape’s filter components
filter out such formattings, replacing them with media-independent alternatives. HTML is converted
into XML during the integration with DocScape.
-
Integration as RTF
RTF is a standardized text format for the recording of text contents via commercially
available text processing programs (and their integration as individual files). If integrated with
DocScape, RTF is converted into XML. From the point of view of media-independence, there is a wide
range of RTF features (font or color switching, table features) which must be filtered out and
replaced by media-independent alternatives. The DocScape component which converts RTF into XML
includes a configurable filter feature which fulfils the following tasks:
- Conversion from RTF into XML without referring to Office software.
- Filtering out of undesired formats (font type and color changes, paragraph formats).
- Conversion of
visual structures (such as tables with
x columns, or changing to larger, bolder fonts) into
logical structures (such as load tables and headings).
- Analysis of meta-information (change tracking).
-
Integration as DocScape XML
For structured texts, DocScape defines its own XML dialect, which serves as conversion target
for all other text formats and may be recorded by DocScape’s
DocEdit component, if required. DocEdit is a browser-based text editor which is
configured via an
XML formula and provides the following functions:
- WYSIWYG editor for structured XML texts.
- Integration into any web data administration mask.
- Look and feel are familiar from Office products.
- Individually adaptable.
- Specification of admissible text structures through XML formula: at any point of the text, only
those structural elements are provided which are admissible at that point.
- Templates, text/table components.
- Transferring contents from Office software via the clipboard is possible, while all structures
which are not admissible at that specific point will be filtered out and converted.
- Data-based generating of content parameters..
-
Image processing
- Processing of complete file trees.

- Conversion into PDF.
- Extraction of cropping and other paths.
- Generation of drop and outline shadows.
- Make functionality.
|
 |
(Deutsch)
Data do not require processing in the systems responsible for print publishing: DocScape takes care
of this task automatically, from data compression to automatic image processing.
|