[Product-Developers] The most efficient way to store 270 AT fields?

Mikko Ohtamaa mikko+plone at redinnovation.com
Mon Jan 5 13:36:31 UTC 2009




Andreas Jung wrote:
> On 05.01.2009 14:04 Uhr, Martin Aspeli wrote:
>> Mikko Ohtamaa wrote:
>>> Hi,
>>>
>>> We are facing a problem where we need to store 270 fields per item. The
>>> fields are laboratory measurements of a patient - 40 measurement
>>> values for
>>> 7 timepoint. The fields need to be accessed per timepoint, per
>>> measurement
>>> and all fields for one patient once. There will be over 10000 patients,
>>> distributed under different hospital items (tree-like, for permission
>>> reasons). Data is not accessed for two patients at once, so we don't
>>> need to
>>> scale the catalog.
>>>
>>> So I am curious about how we make Plone scale well for this scenario.
>>>
>>> - The overhead of a field in AT schema? Should we use normal storage
>>> backend
>>> (Python object value) or can we compress or field values into
>>> list/dict to
>>> make it faster using a custom storage backend.
>>>
>>> - The wake up overhead of AT object? Should we distribute our fields to
>>> several ZODB objects e.g. per timepoint, or just stick all values to one
>>> ZODB objects. All fields per patient are needed on some views once.
>>>
>>> - One big Zope objects vs. few smaller Zope objects?
>> I wouldn't store this in the ZODB, at least not only in the ZODB. Values
>> like this are better stored in an RDBMS, modelled e.g. with a 40-column
>> table (ick) used 7 times (one for each time point) for each patient.
> 
> This is possibly ovehead - especially with collective.tin. Storing the
> data within some BTree datastructure should scale fine and requires a 
> lot less effort than using collective tin.

>True. I was assuming he would need more complex data querying, though. 
>Anything that requires non-trivial joins is probably better served by an 
>RDMBS backend and SQL.

First thank you for everyone for very insightful replies!

I can fill in few gaps and have few more questions:

- Archetypes is used as a generic framework to provide function structures,
but all view and edit pages will be customized in any case - we are not
going to dump the whole schema on the edit page once - the generated HTML
and widget code will be optimized later on.

- We will use ore.contentmirror to mirror the data to SQL database for data
mining. We don't need real-time mirroring or real time queries, thus the
cataloging is not an issue

- Objects are mostly write-once - data shouldn't need to be changed unless
there has been an input error

- We are probably going to split data to objects based by timepoints, as
suggested

- *Is it still desirable to have 7 smaller ZODB objects than one big
object*? What is "the breaking point" of schema when AttributeStorage falls
apart? We need to query 7 ZODB objects to the patient main view to render
the table containing the patient summary data.

- What should we keep in mind if we indent to replace AttributeStorage with
a custom BTreeStorage? 

Cheers,
Mikko
-- 
View this message in context: http://n2.nabble.com/The-most-efficient-way-to-store-270-AT-fields--tp2112645p2112902.html
Sent from the Product Developers mailing list archive at Nabble.com.





More information about the Product-Developers mailing list