[PLIP-Advisories] Re: [Plone] #7822: Make standard file content types use ZODB BLOB support
plip-advisories at lists.plone.org
plip-advisories at lists.plone.org
Wed Sep 30 10:52:54 UTC 2009
#7822: Make standard file content types use ZODB BLOB support
----------------------------+-----------------------------------------------
Reporter: limi | Owner: witsch
Type: PLIP | Status: assigned
Priority: major | Milestone: 4.0
Component: Infrastructure | Resolution:
Keywords: focusarea |
----------------------------+-----------------------------------------------
Old description:
> ,,Copied in part from [http://plone.org/products/plone/roadmap/154/ PLIP
> #154] in the roadmap:,,
>
> = Large file handling =
>
> ''It is possible to configure Zope to work with very large files, but the
> out-of-the-box story is not terribly great. It should be obvious how to
> configure Plone so that it can handle large volumes of MS Office, PDF or
> media files, for example.''
>
> Proposed by::
> Martin Aspeli
> Seconded by::
> Martijn Pieters
> Proposal type::
> Architecture
> Repository branch::
> [browser:plone.app.blob], [browser:plone.app.imaging]
>
> == Motivation ==
>
> In many ways, Plone is well-suited to document management and the
> management of files in general. Tools such as `ExternalEditor` and
> `Enfold Desktop` make this even more true. However, due to the way the
> ZODB works, large files can be problematic to work with — if you're not
> careful, your ZODB could balloon because each change to a file revisions
> the whole object.
>
> There are solutions to this problem, which usually involve storing some
> content outside the ZODB. However, the out-of-the-box story in Plone
> isn't good enough. It needs to be clear how to set up a site to support
> large files, and as far as possible this should work transparently
> whether enabled or disabled.
>
> == Proposal ==
>
> ZODB has integrated BLOB ("binary large object") support starting from
> version 3.8. BLOBs present a way to transparently store binary content
> outside of the usual file storage (i.e. the `Data.fs`), but still as part
> of the ZODB itself. BLOB objects behave like regular files, but are
> entirely managed by the ZODB, meaning that the developer doesn't need to
> care about their file names, transaction-awareness, historic revisions
> etc. Furthermore, accessing such objects is far more efficient as they
> don't have to be loaded into memory as part of another object anymore.
> This significantly lowers memory requirements and thereby frees up RAM as
> well as the ZODB cache for other data.
>
> This PLIP therefore proposes to integrate and use ZODB BLOB support in
> Plone. The existing "File" and "Image" content types should be replaced
> by compatible, blow-aware versions. Any new such content created will
> use ZODB BLOBs to store the actual payload data. Existing object can be
> either left untouched, or else converted to the new types via provided
> in-place migrations.
>
> The key points of such an integration are:
>
> '''Transparency'''::
> The provided optimisations should be as transparent as possible and
> work seamlessly with existing products and tools.
> '''Compatibility'''::
> The new replacement types should aim for full backward-compatibility
> with `ATContentTypes`' "File" and "Image" types.
> '''Performance'''::
> Dealing with large files is typically a performance problem. Loading
> a 1Gb video file into memory every now and then is not acceptable!
> '''Ease of set-up'''::
> It should be easy and obvious how large file optimisations are
> enabled, and what implications any configuration changes have.
>
> == Implementation ==
>
> BLOBs are merely an additional feature of the ZODB and not used in any
> way out-of-the-box. The integrational package `plone.app.blob` uses them
> to provide drop-in replacements for `ATContentTypes`' "File" and "Image"
> content types. `plone.app.imaging` is a supplemental package paving the
> way for using BLOBs to store image scales as well (apart from making the
> scale sizes configurable through-the-web).
>
> `plone.app.blob` provides a blob-aware rewritten version of `Archetypes`'
> `ObjectField` class as well a base content type using that field. So
> called "sub-typing" is then used to mimick the behaviour of the existing
> "File" and "Image" types. The application of marker interfaces as well
> as `archetypes.schemaextender` are used for this. At first this seems to
> complicate matters, but having only one implementation and content type
> for all binary data such as PDFs, audio or video content will make it
> much easier to provide add-on packages like the `Plone4Artists` suite or
> the various image gallery enhancements.
>
> `GenericSetup` profiles can be used to enable these types to be the
> default when creating new "File" or "Image" content. The profiles also
> move the old types, so that they are still available and existing content
> keeps working.
>
> Separate in-place migration facilities are provided to convert existing
> object to make use of the more efficient replacement types. The
> `Products.contentmigration` package is used here, but not needed when no
> migration is required or after it has been completed.
>
> Both packages have extensive tests and provide ways to ensure backward
> compatibility by running tests from other packages such as
> `Products.CMFPlone` and `Products.ATContentTypes` with the blob-aware
> replacement types enabled.
>
> == Deliverables ==
>
> * An add-on package for Plone versions 3.0 and later, providing blob
> support for existing sites
> * Integration into Plone 4.0 so that the drop-in replacement types are
> used by default (essentially this will only hook up the aforementioned
> package and make the necessary test adjustments)
> * Migration facilities for existing "File" and "Image" content
> * Hooks for 3rd-party products to use the blob-aware field and base
> type in order to add additional sub-types and migrations
> * Documentation for users, site integrators as well as developers
> including setup, migration, backup strategies as well as best practices
> of how to use the integration layer with custom types
>
> == Risks ==
>
> * Compatibility issues may arise from using the (mostly rewritten)
> replacement types and there are already a number of known issues.
> However, those will be addressed before a final version of
> `plone.app.blob` and integration into Plone 4.0. Also, the package has
> already proven to work in several production sites, some of which using
> BLOB support for more than a year.
> * The standard ZODB setup would not solely use a file-storage anymore,
> i.e. the common `Data.fs`, but also include a blob-storage, which
> consists of a directory hierarchy with files for each BLOB and its
> revisions. This might be confusing for users when it comes to backing up
> their Zope-related data. Recommended backup strategies should be
> documented to resolve this issue.
> * Some existing content might not be "migratable" due to unforeseen
> issues. This shouldn't be a problem, however, as any existing content
> will remain functional, migrated or not.
> * Using `archetypes.schemaextender` and marker interfaces introduces a
> new level of indirection when implementing content types. It also adds
> performance issues due to the additional and generally more expensive
> schema lookups. This can be solved by also shipping with
> `archetypes.schematuning`, which has been proposed as PLIP 9376.
> `plone.app.blob` doesn't change the schema after creation, so schema
> invalidation is not required here.
> * ZODB's BLOB support might still be too fresh and potentially contain
> bugs. On the other hand, the ZODB 3.8 series has seen two final releases
> and blob support has been successfully used in real-life projects as well
> as other Plone/Zope add-ons.
>
> == Progress ==
>
> * The integration package, `plone.app.blob` is in a usable state and
> being successfully used in several production sites.
> * There are a number of remaining test failures regarding "Image"
> content. All backward-compatibility tests for "File" content currently
> pass.
> * Also, there are a number of pending issues and sensible enhancements
> (see the issue tracker at
> http://plone.org/products/plone.app.blob/issues/)
> * Several beta releases have been made, but a final release will need
> to address the aforementioned issues.
> * In-place migration of existing content has been heavily tested, both
> via integration tests and in "real-life" test runs.
> * Migrating a relatively large site with a 16 GB `Data.fs` including
> about 7,000 "File" content items showed significantly lower memory
> requirements and improved performance. The `Data.fs` went down to 2.5 GB
> and memory usage dropped from 8-9 GB to 3 GB with the same ZODB cache
> size. The migration itself took little more than one hour.
> * Compatibility tests with Plone 3.x are provided via a buildbot
> (http://blobot.zitc.de/). The setup will soon be extended to also cover
> Plone 4.0. However, the test setup is currently still broken for 4.0 due
> to changes in Zope 2.12
>
> == Participants ==
>
> - Andreas Zeidler (IRC nickname: <witsch>)
New description:
,,Copied in part from [http://plone.org/products/plone/roadmap/154/ PLIP
#154] in the roadmap:,,
= Large file handling =
''It is possible to configure Zope to work with very large files, but the
out-of-the-box story is not terribly great. It should be obvious how to
configure Plone so that it can handle large volumes of MS Office, PDF or
media files, for example.''
Proposed by::
Martin Aspeli
Seconded by::
Martijn Pieters
Proposal type::
Architecture
Repository branch::
[browser:plone.app.blob], [browser:plone.app.imaging]
== Motivation ==
In many ways, Plone is well-suited to document management and the
management of files in general. Tools such as `ExternalEditor` and `Enfold
Desktop` make this even more true. However, due to the way the ZODB works,
large files can be problematic to work with — if you're not careful, your
ZODB could balloon because each change to a file revisions the whole
object.
There are solutions to this problem, which usually involve storing some
content outside the ZODB. However, the out-of-the-box story in Plone isn't
good enough. It needs to be clear how to set up a site to support large
files, and as far as possible this should work transparently whether
enabled or disabled.
== Proposal ==
ZODB has integrated BLOB ("binary large object") support starting from
version 3.8. BLOBs present a way to transparently store binary content
outside of the usual file storage (i.e. the `Data.fs`), but still as part
of the ZODB itself. BLOB objects behave like regular files, but are
entirely managed by the ZODB, meaning that the developer doesn't need to
care about their file names, transaction-awareness, historic revisions
etc. Furthermore, accessing such objects is far more efficient as they
don't have to be loaded into memory as part of another object anymore.
This significantly lowers memory requirements and thereby frees up RAM as
well as the ZODB cache for other data.
This PLIP therefore proposes to integrate and use ZODB BLOB support in
Plone. The existing "File" and "Image" content types should be replaced by
compatible, blow-aware versions. Any new such content created will use
ZODB BLOBs to store the actual payload data. Existing object can be
either left untouched, or else converted to the new types via provided in-
place migrations.
The key points of such an integration are:
'''Transparency'''::
The provided optimisations should be as transparent as possible and
work seamlessly with existing products and tools.
'''Compatibility'''::
The new replacement types should aim for full backward-compatibility
with `ATContentTypes`' "File" and "Image" types.
'''Performance'''::
Dealing with large files is typically a performance problem. Loading a
1Gb video file into memory every now and then is not acceptable!
'''Ease of set-up'''::
It should be easy and obvious how large file optimisations are
enabled, and what implications any configuration changes have.
== Implementation ==
BLOBs are merely an additional feature of the ZODB and not used in any way
out-of-the-box. The integrational package `plone.app.blob` uses them to
provide drop-in replacements for `ATContentTypes`' "File" and "Image"
content types. `plone.app.imaging` is a supplemental package paving the
way for using BLOBs to store image scales as well (apart from making the
scale sizes configurable through-the-web).
`plone.app.blob` provides a blob-aware rewritten version of `Archetypes`'
`ObjectField` class as well a base content type using that field. So
called "sub-typing" is then used to mimick the behaviour of the existing
"File" and "Image" types. The application of marker interfaces as well as
`archetypes.schemaextender` are used for this. At first this seems to
complicate matters, but having only one implementation and content type
for all binary data such as PDFs, audio or video content will make it much
easier to provide add-on packages like the `Plone4Artists` suite or the
various image gallery enhancements.
`GenericSetup` profiles can be used to enable these types to be the
default when creating new "File" or "Image" content. The profiles also
move the old types, so that they are still available and existing content
keeps working.
Separate in-place migration facilities are provided to convert existing
object to make use of the more efficient replacement types. The
`Products.contentmigration` package is used here, but not needed when no
migration is required or after it has been completed.
Both packages have extensive tests and provide ways to ensure backward
compatibility by running tests from other packages such as
`Products.CMFPlone` and `Products.ATContentTypes` with the blob-aware
replacement types enabled.
== Deliverables ==
* An add-on package for Plone versions 3.0 and later, providing blob
support for existing sites
* Integration into Plone 4.0 so that the drop-in replacement types are
used by default (essentially this will only hook up the aforementioned
package and make the necessary test adjustments)
* Migration facilities for existing "File" and "Image" content
* Hooks for 3rd-party products to use the blob-aware field and base type
in order to add additional sub-types and migrations
* Documentation for users, site integrators as well as developers
including setup, migration, backup strategies as well as best practices
of how to use the integration layer with custom types
== Risks ==
* Compatibility issues may arise from using the (mostly rewritten)
replacement types and there are already a number of known issues.
However, those will be addressed before a final version of
`plone.app.blob` and integration into Plone 4.0. Also, the package has
already proven to work in several production sites, some of which using
BLOB support for more than a year.
* The standard ZODB setup would not solely use a file-storage anymore,
i.e. the common `Data.fs`, but also include a blob-storage, which consists
of a directory hierarchy with files for each BLOB and its revisions. This
might be confusing for users when it comes to backing up their Zope-
related data. Recommended backup strategies should be documented to
resolve this issue.
* Some existing content might not be "migratable" due to unforeseen
issues. This shouldn't be a problem, however, as any existing content
will remain functional, migrated or not.
* Using `archetypes.schemaextender` and marker interfaces introduces a
new level of indirection when implementing content types. It also adds
performance issues due to the additional and generally more expensive
schema lookups. This can be solved by also shipping with
`archetypes.schematuning`, which has been proposed as PLIP 9376.
`plone.app.blob` doesn't change the schema after creation, so schema
invalidation is not required here.
* ZODB's BLOB support might still be too fresh and potentially contain
bugs. On the other hand, the ZODB 3.8 series has seen two final releases
and blob support has been successfully used in real-life projects as well
as other Plone/Zope add-ons.
== Progress ==
* The integration package, `plone.app.blob` is in a usable state and
being successfully used in several production sites.
* There are a number of remaining test failures regarding "Image"
content. All backward-compatibility tests for "File" content currently
pass.
* Also, there are a number of pending issues and sensible enhancements
(see the issue tracker at
http://plone.org/products/plone.app.blob/issues/)
* Several beta releases have been made, but a final release will need to
address the aforementioned issues.
* In-place migration of existing content has been heavily tested, both
via integration tests and in "real-life" test runs.
* Migrating a relatively large site with a 16 GB `Data.fs` including
about 7,000 "File" content items showed significantly lower memory
requirements and improved performance. The `Data.fs` went down to 2.5 GB
and memory usage dropped from 8-9 GB to 3 GB with the same ZODB cache
size. The migration itself took little more than one hour.
* Compatibility tests with Plone 3.x and 4.0 are provided via a buildbot
(http://blobot.zitc.de/). The latter currently [changeset:29981 still use
ZODB 3.8.3] as newer versions, i.e. 3.9.x, break the necessary setup.
== Participants ==
- Andreas Zeidler (IRC nickname: <witsch>)
--
Comment(by witsch):
updated the status wrt tests on plone 4
--
Ticket URL: <http://dev.plone.org/plone/ticket/7822#comment:57>
Plone <http://plone.org>
Plone Content Management System
More information about the PLIP-Advisories
mailing list