[PLIP-Advisories] Re: [Plone] #7822: Make standard file content types use ZODB BLOB support

plip-advisories at lists.plone.org plip-advisories at lists.plone.org
Wed Sep 30 10:52:54 UTC 2009


#7822: Make standard file content types use ZODB BLOB support
----------------------------+-----------------------------------------------
 Reporter:  limi            |        Owner:  witsch  
     Type:  PLIP            |       Status:  assigned
 Priority:  major           |    Milestone:  4.0     
Component:  Infrastructure  |   Resolution:          
 Keywords:  focusarea       |  
----------------------------+-----------------------------------------------

Old description:

> ,,Copied in part from [http://plone.org/products/plone/roadmap/154/ PLIP
> #154] in the roadmap:,,
>
> = Large file handling =
>
> ''It is possible to configure Zope to work with very large files, but the
> out-of-the-box story is not terribly great. It should be obvious how to
> configure Plone so that it can handle large volumes of MS Office, PDF or
> media files, for example.''
>
>  Proposed by::
>    Martin Aspeli
>  Seconded by::
>    Martijn Pieters
>  Proposal type::
>    Architecture
>  Repository branch::
>    [browser:plone.app.blob], [browser:plone.app.imaging]
>

> == Motivation ==
>
> In many ways, Plone is well-suited to document management and the
> management of files in general. Tools such as `ExternalEditor` and
> `Enfold Desktop` make this even more true. However, due to the way the
> ZODB works, large files can be problematic to work with — if you're not
> careful, your ZODB could balloon because each change to a file revisions
> the whole object.
>
> There are solutions to this problem, which usually involve storing some
> content outside the ZODB. However, the out-of-the-box story in Plone
> isn't good enough. It needs to be clear how to set up a site to support
> large files, and as far as possible this should work transparently
> whether enabled or disabled.
>

> == Proposal ==
>
> ZODB has integrated BLOB ("binary large object") support starting from
> version 3.8.  BLOBs present a way to transparently store binary content
> outside of the usual file storage (i.e. the `Data.fs`), but still as part
> of the ZODB itself.  BLOB objects behave like regular files, but are
> entirely managed by the ZODB, meaning that the developer doesn't need to
> care about their file names, transaction-awareness, historic revisions
> etc.  Furthermore, accessing such objects is far more efficient as they
> don't have to be loaded into memory as part of another object anymore.
> This significantly lowers memory requirements and thereby frees up RAM as
> well as the ZODB cache for other data.
>
> This PLIP therefore proposes to integrate and use ZODB BLOB support in
> Plone. The existing "File" and "Image" content types should be replaced
> by compatible, blow-aware versions.  Any new such content created will
> use ZODB BLOBs to store the actual payload data.  Existing object can be
> either left untouched, or else converted to the new types via provided
> in-place migrations.
>
> The key points of such an integration are:
>
>   '''Transparency'''::
>     The provided optimisations should be as transparent as possible and
> work seamlessly with existing products and tools.
>   '''Compatibility'''::
>     The new replacement types should aim for full backward-compatibility
> with `ATContentTypes`' "File" and "Image" types.
>   '''Performance'''::
>     Dealing with large files is typically a performance problem. Loading
> a 1Gb video file into memory every now and then is not acceptable!
>   '''Ease of set-up'''::
>     It should be easy and obvious how large file optimisations are
> enabled, and what implications any configuration changes have.
>

> == Implementation ==
>
> BLOBs are merely an additional feature of the ZODB and not used in any
> way out-of-the-box.  The integrational package `plone.app.blob` uses them
> to provide drop-in replacements for `ATContentTypes`' "File" and "Image"
> content types.  `plone.app.imaging` is a supplemental package paving the
> way for using BLOBs to store image scales as well (apart from making the
> scale sizes configurable through-the-web).
>
> `plone.app.blob` provides a blob-aware rewritten version of `Archetypes`'
> `ObjectField` class as well a base content type using that field.  So
> called "sub-typing" is then used to mimick the behaviour of the existing
> "File" and "Image" types.  The application of marker interfaces as well
> as `archetypes.schemaextender` are used for this.  At first this seems to
> complicate matters, but having only one implementation and content type
> for all binary data such as PDFs, audio or video content will make it
> much easier to provide add-on packages like the `Plone4Artists` suite or
> the various image gallery enhancements.
>
> `GenericSetup` profiles can be used to enable these types to be the
> default when creating new "File" or "Image" content.  The profiles also
> move the old types, so that they are still available and existing content
> keeps working.
>
> Separate in-place migration facilities are provided to convert existing
> object to make use of the more efficient replacement types.  The
> `Products.contentmigration` package is used here, but not needed when no
> migration is required or after it has been completed.
>
> Both packages have extensive tests and provide ways to ensure backward
> compatibility by running tests from other packages such as
> `Products.CMFPlone` and `Products.ATContentTypes` with the blob-aware
> replacement types enabled.
>

> == Deliverables ==
>
>   * An add-on package for Plone versions 3.0 and later, providing blob
> support for existing sites
>   * Integration into Plone 4.0 so that the drop-in replacement types are
> used by default (essentially this will only hook up the aforementioned
> package and make the necessary test adjustments)
>   * Migration facilities for existing "File" and "Image" content
>   * Hooks for 3rd-party products to use the blob-aware field and base
> type in order to add additional sub-types and migrations
>   * Documentation for users, site integrators as well as developers
> including  setup, migration, backup strategies as well as best practices
> of how to use the integration layer with custom types
>

> == Risks ==
>
>   * Compatibility issues may arise from using the (mostly rewritten)
> replacement types and there are already a number of known issues.
> However, those will be addressed before a final version of
> `plone.app.blob` and integration into Plone 4.0.  Also, the package has
> already proven to work in several production sites, some of which using
> BLOB support for more than a year.
>   * The standard ZODB setup would not solely use a file-storage anymore,
> i.e. the common `Data.fs`, but also include a blob-storage, which
> consists of a directory hierarchy with files for each BLOB and its
> revisions.  This might be confusing for users when it comes to backing up
> their Zope-related data.   Recommended backup strategies should be
> documented to resolve this issue.
>   * Some existing content might not be "migratable" due to unforeseen
> issues.  This shouldn't be a problem, however, as any existing content
> will remain functional, migrated or not.
>   * Using `archetypes.schemaextender` and marker interfaces introduces a
> new level of indirection when implementing content types.  It also adds
> performance issues due to the additional and generally more expensive
> schema lookups.  This can be solved by also shipping with
> `archetypes.schematuning`, which has been proposed as PLIP 9376.
> `plone.app.blob` doesn't change the schema after creation, so schema
> invalidation is not required here.
>   * ZODB's BLOB support might still be too fresh and potentially contain
> bugs.  On the other hand, the ZODB 3.8 series has seen two final releases
> and blob support has been successfully used in real-life projects as well
> as other Plone/Zope add-ons.
>

> == Progress ==
>
>   * The integration package, `plone.app.blob` is in a usable state and
> being successfully used in several production sites.
>   * There are a number of remaining test failures regarding "Image"
> content.  All backward-compatibility tests for "File" content currently
> pass.
>   * Also, there are a number of pending issues and sensible enhancements
> (see the issue tracker at
> http://plone.org/products/plone.app.blob/issues/)
>   * Several beta releases have been made, but a final release will need
> to address the aforementioned issues.
>   * In-place migration of existing content has been heavily tested, both
> via integration tests and in "real-life" test runs.
>   * Migrating a relatively large site with a 16 GB `Data.fs` including
> about 7,000 "File" content items showed significantly lower memory
> requirements and improved performance.  The `Data.fs` went down to 2.5 GB
> and memory usage dropped from 8-9 GB to 3 GB with the same ZODB cache
> size.  The migration itself took little more than one hour.
>   * Compatibility tests with Plone 3.x are provided via a buildbot
> (http://blobot.zitc.de/).  The setup will soon be extended to also cover
> Plone 4.0.  However, the test setup is currently still broken for 4.0 due
> to changes in Zope 2.12
>

> == Participants ==
>
>  - Andreas Zeidler (IRC nickname: <witsch>)

New description:

 ,,Copied in part from [http://plone.org/products/plone/roadmap/154/ PLIP
 #154] in the roadmap:,,

 = Large file handling =

 ''It is possible to configure Zope to work with very large files, but the
 out-of-the-box story is not terribly great. It should be obvious how to
 configure Plone so that it can handle large volumes of MS Office, PDF or
 media files, for example.''

  Proposed by::
    Martin Aspeli
  Seconded by::
    Martijn Pieters
  Proposal type::
    Architecture
  Repository branch::
    [browser:plone.app.blob], [browser:plone.app.imaging]


 == Motivation ==

 In many ways, Plone is well-suited to document management and the
 management of files in general. Tools such as `ExternalEditor` and `Enfold
 Desktop` make this even more true. However, due to the way the ZODB works,
 large files can be problematic to work with — if you're not careful, your
 ZODB could balloon because each change to a file revisions the whole
 object.

 There are solutions to this problem, which usually involve storing some
 content outside the ZODB. However, the out-of-the-box story in Plone isn't
 good enough. It needs to be clear how to set up a site to support large
 files, and as far as possible this should work transparently whether
 enabled or disabled.


 == Proposal ==

 ZODB has integrated BLOB ("binary large object") support starting from
 version 3.8.  BLOBs present a way to transparently store binary content
 outside of the usual file storage (i.e. the `Data.fs`), but still as part
 of the ZODB itself.  BLOB objects behave like regular files, but are
 entirely managed by the ZODB, meaning that the developer doesn't need to
 care about their file names, transaction-awareness, historic revisions
 etc.  Furthermore, accessing such objects is far more efficient as they
 don't have to be loaded into memory as part of another object anymore.
 This significantly lowers memory requirements and thereby frees up RAM as
 well as the ZODB cache for other data.

 This PLIP therefore proposes to integrate and use ZODB BLOB support in
 Plone. The existing "File" and "Image" content types should be replaced by
 compatible, blow-aware versions.  Any new such content created will use
 ZODB BLOBs to store the actual payload data.  Existing object can be
 either left untouched, or else converted to the new types via provided in-
 place migrations.

 The key points of such an integration are:

   '''Transparency'''::
     The provided optimisations should be as transparent as possible and
 work seamlessly with existing products and tools.
   '''Compatibility'''::
     The new replacement types should aim for full backward-compatibility
 with `ATContentTypes`' "File" and "Image" types.
   '''Performance'''::
     Dealing with large files is typically a performance problem. Loading a
 1Gb video file into memory every now and then is not acceptable!
   '''Ease of set-up'''::
     It should be easy and obvious how large file optimisations are
 enabled, and what implications any configuration changes have.


 == Implementation ==

 BLOBs are merely an additional feature of the ZODB and not used in any way
 out-of-the-box.  The integrational package `plone.app.blob` uses them to
 provide drop-in replacements for `ATContentTypes`' "File" and "Image"
 content types.  `plone.app.imaging` is a supplemental package paving the
 way for using BLOBs to store image scales as well (apart from making the
 scale sizes configurable through-the-web).

 `plone.app.blob` provides a blob-aware rewritten version of `Archetypes`'
 `ObjectField` class as well a base content type using that field.  So
 called "sub-typing" is then used to mimick the behaviour of the existing
 "File" and "Image" types.  The application of marker interfaces as well as
 `archetypes.schemaextender` are used for this.  At first this seems to
 complicate matters, but having only one implementation and content type
 for all binary data such as PDFs, audio or video content will make it much
 easier to provide add-on packages like the `Plone4Artists` suite or the
 various image gallery enhancements.

 `GenericSetup` profiles can be used to enable these types to be the
 default when creating new "File" or "Image" content.  The profiles also
 move the old types, so that they are still available and existing content
 keeps working.

 Separate in-place migration facilities are provided to convert existing
 object to make use of the more efficient replacement types.  The
 `Products.contentmigration` package is used here, but not needed when no
 migration is required or after it has been completed.

 Both packages have extensive tests and provide ways to ensure backward
 compatibility by running tests from other packages such as
 `Products.CMFPlone` and `Products.ATContentTypes` with the blob-aware
 replacement types enabled.


 == Deliverables ==

   * An add-on package for Plone versions 3.0 and later, providing blob
 support for existing sites
   * Integration into Plone 4.0 so that the drop-in replacement types are
 used by default (essentially this will only hook up the aforementioned
 package and make the necessary test adjustments)
   * Migration facilities for existing "File" and "Image" content
   * Hooks for 3rd-party products to use the blob-aware field and base type
 in order to add additional sub-types and migrations
   * Documentation for users, site integrators as well as developers
 including  setup, migration, backup strategies as well as best practices
 of how to use the integration layer with custom types


 == Risks ==

   * Compatibility issues may arise from using the (mostly rewritten)
 replacement types and there are already a number of known issues.
 However, those will be addressed before a final version of
 `plone.app.blob` and integration into Plone 4.0.  Also, the package has
 already proven to work in several production sites, some of which using
 BLOB support for more than a year.
   * The standard ZODB setup would not solely use a file-storage anymore,
 i.e. the common `Data.fs`, but also include a blob-storage, which consists
 of a directory hierarchy with files for each BLOB and its revisions.  This
 might be confusing for users when it comes to backing up their Zope-
 related data.   Recommended backup strategies should be documented to
 resolve this issue.
   * Some existing content might not be "migratable" due to unforeseen
 issues.  This shouldn't be a problem, however, as any existing content
 will remain functional, migrated or not.
   * Using `archetypes.schemaextender` and marker interfaces introduces a
 new level of indirection when implementing content types.  It also adds
 performance issues due to the additional and generally more expensive
 schema lookups.  This can be solved by also shipping with
 `archetypes.schematuning`, which has been proposed as PLIP 9376.
 `plone.app.blob` doesn't change the schema after creation, so schema
 invalidation is not required here.
   * ZODB's BLOB support might still be too fresh and potentially contain
 bugs.  On the other hand, the ZODB 3.8 series has seen two final releases
 and blob support has been successfully used in real-life projects as well
 as other Plone/Zope add-ons.


 == Progress ==

   * The integration package, `plone.app.blob` is in a usable state and
 being successfully used in several production sites.
   * There are a number of remaining test failures regarding "Image"
 content.  All backward-compatibility tests for "File" content currently
 pass.
   * Also, there are a number of pending issues and sensible enhancements
 (see the issue tracker at
 http://plone.org/products/plone.app.blob/issues/)
   * Several beta releases have been made, but a final release will need to
 address the aforementioned issues.
   * In-place migration of existing content has been heavily tested, both
 via integration tests and in "real-life" test runs.
   * Migrating a relatively large site with a 16 GB `Data.fs` including
 about 7,000 "File" content items showed significantly lower memory
 requirements and improved performance.  The `Data.fs` went down to 2.5 GB
 and memory usage dropped from 8-9 GB to 3 GB with the same ZODB cache
 size.  The migration itself took little more than one hour.
   * Compatibility tests with Plone 3.x and 4.0 are provided via a buildbot
 (http://blobot.zitc.de/).  The latter currently [changeset:29981 still use
 ZODB 3.8.3] as newer versions, i.e. 3.9.x, break the necessary setup.


 == Participants ==

  - Andreas Zeidler (IRC nickname: <witsch>)

--

Comment(by witsch):

 updated the status wrt tests on plone 4

-- 
Ticket URL: <http://dev.plone.org/plone/ticket/7822#comment:57>
Plone <http://plone.org>
Plone Content Management System


More information about the PLIP-Advisories mailing list