[Product-Developers] The most efficient way to store 270 AT fields?

Souheil CHELFOUH trollfot at gmail.com
Wed Jan 7 11:26:36 UTC 2009


You can also check the project i'm working on, Spear, for lightweight
and already working content type mini framework
http://tracker.trollfot.org/browser/projects/spear.example/trunk/spear/example

2009/1/5 Mikko Ohtamaa <mikko+plone at redinnovation.com>:
>
>
>
> Andreas Jung wrote:
>> On 05.01.2009 14:04 Uhr, Martin Aspeli wrote:
>>> Mikko Ohtamaa wrote:
>>>> Hi,
>>>>
>>>> We are facing a problem where we need to store 270 fields per item. The
>>>> fields are laboratory measurements of a patient - 40 measurement
>>>> values for
>>>> 7 timepoint. The fields need to be accessed per timepoint, per
>>>> measurement
>>>> and all fields for one patient once. There will be over 10000 patients,
>>>> distributed under different hospital items (tree-like, for permission
>>>> reasons). Data is not accessed for two patients at once, so we don't
>>>> need to
>>>> scale the catalog.
>>>>
>>>> So I am curious about how we make Plone scale well for this scenario.
>>>>
>>>> - The overhead of a field in AT schema? Should we use normal storage
>>>> backend
>>>> (Python object value) or can we compress or field values into
>>>> list/dict to
>>>> make it faster using a custom storage backend.
>>>>
>>>> - The wake up overhead of AT object? Should we distribute our fields to
>>>> several ZODB objects e.g. per timepoint, or just stick all values to one
>>>> ZODB objects. All fields per patient are needed on some views once.
>>>>
>>>> - One big Zope objects vs. few smaller Zope objects?
>>> I wouldn't store this in the ZODB, at least not only in the ZODB. Values
>>> like this are better stored in an RDBMS, modelled e.g. with a 40-column
>>> table (ick) used 7 times (one for each time point) for each patient.
>>
>> This is possibly ovehead - especially with collective.tin. Storing the
>> data within some BTree datastructure should scale fine and requires a
>> lot less effort than using collective tin.
>
>>True. I was assuming he would need more complex data querying, though.
>>Anything that requires non-trivial joins is probably better served by an
>>RDMBS backend and SQL.
>
> First thank you for everyone for very insightful replies!
>
> I can fill in few gaps and have few more questions:
>
> - Archetypes is used as a generic framework to provide function structures,
> but all view and edit pages will be customized in any case - we are not
> going to dump the whole schema on the edit page once - the generated HTML
> and widget code will be optimized later on.
>
> - We will use ore.contentmirror to mirror the data to SQL database for data
> mining. We don't need real-time mirroring or real time queries, thus the
> cataloging is not an issue
>
> - Objects are mostly write-once - data shouldn't need to be changed unless
> there has been an input error
>
> - We are probably going to split data to objects based by timepoints, as
> suggested
>
> - *Is it still desirable to have 7 smaller ZODB objects than one big
> object*? What is "the breaking point" of schema when AttributeStorage falls
> apart? We need to query 7 ZODB objects to the patient main view to render
> the table containing the patient summary data.
>
> - What should we keep in mind if we indent to replace AttributeStorage with
> a custom BTreeStorage?
>
> Cheers,
> Mikko
> --
> View this message in context: http://n2.nabble.com/The-most-efficient-way-to-store-270-AT-fields--tp2112645p2112902.html
> Sent from the Product Developers mailing list archive at Nabble.com.
>
>
> _______________________________________________
> Product-Developers mailing list
> Product-Developers at lists.plone.org
> http://lists.plone.org/mailman/listinfo/product-developers
>




More information about the Product-Developers mailing list