[PLIP-Advisories] Re: [Plone] #9328: content im-/export

plip-advisories at lists.plone.org plip-advisories at lists.plone.org
Tue Jun 30 12:16:38 UTC 2009


#9328: content im-/export
---------------------+------------------------------------------------------
 Reporter:  csenger  |        Owner:  csenger
     Type:  PLIP     |       Status:  new    
 Priority:  minor    |    Milestone:  4.0    
Component:  Unknown  |   Resolution:         
 Keywords:           |  
---------------------+------------------------------------------------------

Old description:

> == Motivation ==
>
> Content ex-/import is an important functionality for different tasks,
> e.g. using a dedicated editing site during development and transferring
> the content into a production site without risking migration issues. It
> is also important if an in-site migration is not possible. This might be
> the case for a Plone release after 4.x. Currently used solutions include
>
>  * portal_setup's content step - Discouraged in most discussions and
>    at least lacks support for binary data and references
>  * gsxml [http://pypi.python.org/pypi/collective.plone.gsxml pypi
>    page] [ - Generic Product for Archetypes based content types. Uses
>    atxml and is buggy, incomplete, under documented and difficult to
>    set up, but does the job generally.
>  * hand written scripts like
>    [http://www.zopyx.de/blog/updated-when-the-plone-migration-fails-
> doing-content-migration-only
>    those from Andreas Jung (svn down atm)].
>  * collective.transmogrifier
>    [http://dev.plone.org/collective/browser/collective.transmogrifier
>    trac] - generic, extensible solution with a number of add ons
>    (e.g.
> [http://dev.plone.org/collective/browser/collective.transmogrifier
>    plone.app.transmogrifier],
>    [http://svn.quintagroup.com/products/quintagroup.transmogrifier
>    quintagroup.transmogrifier]) that build an already working
>    im-/export. Downsides are: Transmogrifier has no final release, no
>    end user interface or documentation and is complex.
>
> All of these solutions have their problems and are incomplete, under
> documented, difficult to set up or not flexible enough.
>

> == Definitions ==
>
> transmogrifier vocabulary
>
>  pipeline::
>    A sequence of sections that is processed.
>  section::
>    A section consists of a blueprint and optional configuration variables
>  blueprint::
>    An class that provides ISectionBlueprint and implements ISection. In
> fact it is just a callable that implements __iter__ to be used with
> python's iteration protocol.
>  source::
>    A blueprint that reads in data that will be used by another blueprint
> in the pipeline. There can be more than one source where the second
> source injects new items into the pipeline.
>  constructor::
>    A blueprint that reads the data and constructs an object.
>

> == Proposal ==
>
> This PLIP aims to provide a solution for plone that
>
>   * can be used to export the out-of-the box and most add-on content
> types
>   * is extensible so add ons can add ex-/import data that can not be
> covered by a generic solution
>   * is ready to use for an administrator out-of-the box
>   * is integrated into the control panel.
>   * can be used by developers to write a custom import for external data
>
> Why a Proposal for Plone 4?
>
>  * It should be the canonical ex-/import mechanism that add-on developer
> extend if the generic part does not cover enough data.
>  * With dexterity and plone.app.content, there are other ways than
> archetypes to construct content. It seems impossible to support them and
> maintain the code outside of plone core.
>  * It's regularly requested and import is one of the problems people are
> facing when it comes to migrating external to plone.
>
> It can be added to a later Plone 4.x release just as well as it does not
> need changes to plone core and doesn't introduce backward
> incompatibility, but I submit it for Plone 4.0 to begin with.
>
> == Assumptions ==
>
> This im-/export system covered by this PLIP handles only archetypes
> content and few special cases like comments. Generic blueprints for zope
> 3 schemata handling is not part of this PLIP.
>
> == Implementation ==
>
> The export will be implemented with collective.transmogrifier. The main
> reasons are that it is extensible, fast and there are already most
> necessary blueprints implemented in plone.app.transmogrifier and
> quintagroup.transmogrifier, collective.blueprint.translationlinker. These
> include handling of Archetypes and ATCT + topics and their criteria (a
> port of gxml I think), references, comments, translation links, Browser
> Defaults and workflow state.
>
> quintagroup.transmogrifier already implements a working ex-/import into a
> tarball. It uses atxml handler from Products.Marshall to export an
> archetypes object to xml
> ([http://dev.plone.org/archetypes/browser/Products.Marshall/trunk/Products/Marshall/tests/input/atxml/Document.xml
> example output])
> To write and read the data, it uses GenericSetup's TarballExportContext
> and TarballImportContext (with two small monkey patches). The structure
> of the tarball is similar the the generic setup content im-/export step
> and contains folders and xml files:
>

>  * '''structure/'''
>    * '''.objects.xml'''[[BR]]
>      <?xml version="1.0" ?>[[BR]]
>      <manifest> ... <record type="Document">front-page</record> ...
>    * '''.properties.xml'''[[BR]]
>      Xml produced with GenericSetup's propertymanager support and
> contains properties like default_page.
>    * '''front-page/'''
>      * '''.marshall.xml'''[[BR]]
>        See atxml's example output.
>   * '''news/'''
>     * ...
>     * '''aggregator/'''
>       * '''.properties.xml'''
>       * '''.objects.xml'''
>       * '''.marshall.xml'''
>       * '''crit__effective_ATSortCriterion/'''
>         * '''.marshall.xml'''
>       * ...
>
> The work on this PLIP is split into two major steps:
>
>  1. Get a reliable, complete, hard wired content im-/export for
>     an out-of-the-box plone site
>  2. Make the system flexible enough to support add-on products and
>     maybe TTW configuration of the export process.
>
> === 1. Out-of-the-box Plone im-/export ===
>
> This already supports add-ons as long as all information are saved in
> archetype schemata.
>
>  1. Review the existing blueprints
>  2. see what information we additionally need to export and
>     write the missing blueprints
>  3. write a pipeline configuration for im- and for export that works
>     within a plone version.
>  4. write a utility and a basic export control panel
>  5. Get all used packages into the collective or the plone repository
>     where they can be maintained.
>
> === 2. Flexibility to support add-ons and configuration ===
>
> A transmogrifier pipeline consists of many section where every section
> defines the blueprint to use and a number of configuration variables.
>
> {{{
> >>> exampleconfig = """\
> ... [transmogrifier]
> ... pipeline =
> ...     section 1
> ...     section 2
> ...
> ... [section 1]
> ... blueprint = collective.transmogrifier.tests.examplesource
> ... size = 5
> ...
> ... [section 2]
> ... blueprint = collective.transmogrifier.tests.exampletransform
> }}}
>

> We split the configuration into PloneTransmogrifierConfigProviders. They
> provide
>  * information for the user interface (Title, Description)
>  * one or more sections together with information
>    * which kind of blueprint the section contains (source, transformer,
> writer; reader, transformer, constructor)
>    * the priority of the section (like init scripts) within the group
>
> The utility that composes the pipeline can then order the sections it
> receives from different ConfigProviders without knowing more about them.
> If an add-on registers a ConfigProvider, it can be integrated into the
> pipeline with a low chance to break the export.
>

> ==== Why not have one config provider per available blueprint? ====
>
>  1. One or more sections (blueprints) are bound together if they do one
> thing at different points in the pipeline. An example is one blueprint
> that reads the information which object is the canonical version of a
> translation and a second blueprint that links the objects together after
> they were constructed by another blueprint.
>
>  2. One blueprint can also be used several times like one that is
> transforming parts based on an regular expression so more than one
> PloneTransfomrationConfigProvider can use the same blueprint.
>
> ==== Configurability ====
>
> PloneTransmogrifierConfigProviders can also be used to give the user the
> option to disable or configure certain tasks. Every provider could
> contain a zope schema to display an edit form with an option to disable
> it. If this generally makes sense has to be explored.
>
> Another option would be to write a set of filter blueprints that are
> configurable and allow to configure e.g. the set of content types etc.
> that are removed before the export archive is generated or the imported
> data is written to the database.
>

> == Risks ==
>
> The key component for reading/writing archetypes content is atxml from
> the Products.Marshall package. This package is kept in a working state,
> but is not well maintained. The unit tests of the package are not
> working. It seems to be an acceptable risk as this is the case for a long
> time and the package seems to be used by many people.
>
> The package might not be finished within the the 4.0 release cycle.
> Beside the glue code there are are lots of details to be implemented and
> tested. But it's no problem to introduce the package in a later Plone 4.x
> release.
>

> == Deliverables ==
>
>  * Consolidate blueprint packages
>  * A plone package that contains the configuration backend and the
> control panel
>  * ConfigProviders partly in the plone package, partly in external
> packages that implement the blueprints
>  * Unit tests
>  * Developer and end user documentation
>
> == Participants ==
>
> Carsten Senger (csenger)

New description:

 == Motivation ==

 Content ex-/import is an important functionality for different tasks, e.g.
 using a dedicated editing site during development and transferring the
 content into a production site without risking migration issues. It is
 also important if an in-site migration is not possible. This might be the
 case for a Plone release after 4.x. Currently used solutions include

  * portal_setup's content step - Discouraged in most discussions and
    at least lacks support for binary data and references
  * gsxml [http://pypi.python.org/pypi/collective.plone.gsxml pypi
    page] [ - Generic Product for Archetypes based content types. Uses
    atxml and is buggy, incomplete, under documented and difficult to
    set up, but does the job generally.
  * hand written scripts like
    [http://www.zopyx.de/blog/updated-when-the-plone-migration-fails-doing-
 content-migration-only
    those from Andreas Jung (svn down atm)].
  * collective.transmogrifier
    [http://dev.plone.org/collective/browser/collective.transmogrifier
    trac] - generic, extensible solution with a number of add ons
    (e.g.
 [http://dev.plone.org/collective/browser/collective.transmogrifier
    plone.app.transmogrifier],
    [http://svn.quintagroup.com/products/quintagroup.transmogrifier
    quintagroup.transmogrifier]) that build an already working
    im-/export. Downsides are: Transmogrifier has no final release, no
    end user interface or documentation and is complex.

 All of these solutions have their problems and are incomplete, under
 documented, difficult to set up or not flexible enough.


 == Definitions ==

 transmogrifier vocabulary

  pipeline::
    A sequence of sections that is processed.
  section::
    A section consists of a blueprint and optional configuration variables
  blueprint::
    An class that provides ISectionBlueprint and implements ISection. In
 fact it is just a callable that implements __iter__ to be used with
 python's iteration protocol.
  source::
    A blueprint that reads in data that will be used by another blueprint
 in the pipeline. There can be more than one source where the second source
 injects new items into the pipeline.
  constructor::
    A blueprint that reads the data and constructs an object.


 == Proposal ==

 This PLIP aims to provide a solution for plone that

   * can be used to export the out-of-the box and most add-on content types
   * is extensible so add ons can add ex-/import data that can not be
 covered by a generic solution
   * is ready to use for an administrator out-of-the box
   * is integrated into the control panel.
   * can be used by developers to write a custom import for external data

 Why a Proposal for Plone 4?

  * It should be the canonical ex-/import mechanism that add-on developer
 extend if the generic part does not cover enough data.
  * With dexterity and plone.app.content, there are other ways than
 archetypes to construct content. It seems impossible to support them and
 maintain the code outside of plone core.
  * It's regularly requested and import is one of the problems people are
 facing when it comes to migrating external to plone.

 It can be added to a later Plone 4.x release just as well as it does not
 need changes to plone core and doesn't introduce backward incompatibility,
 but I submit it for Plone 4.0 to begin with.

 == Assumptions ==

 This im-/export system covered by this PLIP handles only archetypes
 content and few special cases like comments. Generic blueprints for zope 3
 schemata handling is not part of this PLIP.

 == Implementation ==

 The export will be implemented with collective.transmogrifier. The main
 reasons are that it is extensible, fast and there are already most
 necessary blueprints implemented in plone.app.transmogrifier and
 quintagroup.transmogrifier, collective.blueprint.translationlinker. These
 include handling of Archetypes and ATCT + topics and their criteria (a
 port of gxml I think), references, comments, translation links, Browser
 Defaults and workflow state.

 quintagroup.transmogrifier already implements a working ex-/import into a
 tarball. It uses atxml handler from Products.Marshall to export an
 archetypes object to xml
 ([http://dev.plone.org/archetypes/browser/Products.Marshall/trunk/Products/Marshall/tests/input/atxml/Document.xml
 example output])
 To write and read the data, it uses GenericSetup's TarballExportContext
 and TarballImportContext (with two small monkey patches). The structure of
 the tarball is similar the the generic setup content im-/export step and
 contains folders and xml files:


  * '''structure/'''
    * '''.objects.xml'''[[BR]]
      <?xml version="1.0" ?>[[BR]]
      <manifest> ... <record type="Document">front-page</record> ...
    * '''.properties.xml'''[[BR]]
      Xml produced with GenericSetup's propertymanager support and contains
 properties like default_page.
    * '''front-page/'''
      * '''.marshall.xml'''[[BR]]
        See atxml's example output.
   * '''news/'''
     * ...
     * '''aggregator/'''
       * '''.properties.xml'''
       * '''.objects.xml'''
       * '''.marshall.xml'''
       * '''crit__effective_ATSortCriterion/'''
         * '''.marshall.xml'''
       * ...

 The work on this PLIP is split into two major steps:

  1. Get a reliable, complete, hard wired content im-/export for
     an out-of-the-box plone site
  2. Make the system flexible enough to support add-on products and
     maybe TTW configuration of the export process.

 === 1. Out-of-the-box Plone im-/export ===

 This already supports add-ons as long as all information are saved in
 archetype schemata.

  1. Review the existing blueprints
  2. see what information we additionally need to export and
     write the missing blueprints
  3. write a pipeline configuration for im- and for export that works
     within a plone version.
  4. write a utility and a basic export control panel
  5. Get all used packages into the collective or the plone repository
     where they can be maintained.

 === 2. Flexibility to support add-ons and configuration ===

 A transmogrifier pipeline consists of many section where every section
 defines the blueprint to use and a number of configuration variables.

 {{{
 >>> exampleconfig = """\
 ... [transmogrifier]
 ... pipeline =
 ...     section 1
 ...     section 2
 ...
 ... [section 1]
 ... blueprint = collective.transmogrifier.tests.examplesource
 ... size = 5
 ...
 ... [section 2]
 ... blueprint = collective.transmogrifier.tests.exampletransform
 }}}


 We split the configuration into PloneTransmogrifierConfigProviders. They
 provide
  * information for the user interface (Title, Description)
  * one or more sections together with information
    * which kind of blueprint the section contains (source, transformer,
 writer; reader, transformer, constructor)
    * the priority of the section (like init scripts) within the group

 The utility that composes the pipeline can then order the sections it
 receives from different ConfigProviders without knowing more about them.
 If an add-on registers a ConfigProvider, it can be integrated into the
 pipeline with a low chance to break the export.


 ==== Why not have one config provider per available blueprint? ====

  1. One or more sections (blueprints) are bound together if they do one
 thing at different points in the pipeline. An example is one blueprint
 that reads the information which object is the canonical version of a
 translation and a second blueprint that links the objects together after
 they were constructed by another blueprint.

  2. One blueprint can also be used several times like one that is
 transforming parts based on an regular expression so more than one
 PloneTransfomrationConfigProvider can use the same blueprint.

 ==== Configurability ====

 PloneTransmogrifierConfigProviders can also be used to give the user the
 option to disable or configure certain tasks. Every provider could contain
 a zope schema to display an edit form with an option to disable it. If
 this generally makes sense has to be explored.

 Another option would be to write a set of filter blueprints that are
 configurable and allow to configure e.g. the set of content types etc.
 that are removed before the export archive is generated or the imported
 data is written to the database.


 == Risks ==

 The key component for reading/writing archetypes content is atxml from the
 Products.Marshall package. This package is kept in a working state, but is
 not well maintained. The unit tests of the package are not working. It
 seems to be an acceptable risk as this is the case for a long time and the
 package seems to be used by many people.

 The package might not be finished within the the 4.0 release cycle. Beside
 the glue code there are are lots of details to be implemented and tested.
 But it's no problem to introduce the package in a later Plone 4.x release.


 == Deliverables ==

  * Consolidate blueprint packages
  * A plone package that contains the configuration backend and the control
 panel
  * ConfigProviders partly in the plone package, partly in external
 packages that implement the blueprints
  * Unit tests
  * Developer and end user documentation

 == Participants ==

 Carsten Senger (csenger)

 == Progress and further information ==

 See PlipContentImExport

--

Comment(by csenger):

 Add link to PlipContentImExport to Description

-- 
Ticket URL: <https://dev.plone.org/old/plone/ticket/9328#comment:18>
Plone <http://plone.org>
Plone Content Management System


More information about the PLIP-Advisories mailing list