aggregate-functions


marklogic need advise for approach to aggregate documents


Hi MarkLogicians out there,
SEE BELOW FOR EDIT
I have the following challenge:
I have soci demographic zip code data, from a flat relational table.
According to good practice I have created one document per row. The doc(row) holds roughly 400 values structured in 7 categories of 40 variables, for each variable there are 4 to 7 segments.
<doc id=1011AB >
<cat>
<var>
<seg>
25
</seg>
</var>
</cat>
</doc>
There are 500.000 documents like these, we need to aggregate the 6 digit level to a higher zipcode level(4 digits) around 40.000 documents.
We have working code for aggregating one segment per document. Now I am looking for a solution to aggregate the 6 digit level to the 4 digit level. The aggregation basically is a calculation of weighted averages.
My question:
Is there an elegant why to take a 6 digit level document as a template and fill it out or do i need to build the 4 digit level doc from scratch?
=============== EDIT ===================
ok so now I have a map in which we created a joined key like
<map:map xmlns:map="http://marklogic.com/xdmp/map" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<map:entry key="Consumententypes (sub):::Type 6, gezin met jongste kind 6+::: gezin met jongste kind 6+">
<map:value xsi:type="xs:double">
0
</map:value>
</map:entry>
<map:entry key="Woning:::Woontype:::De Veelbelovende Starter">
<map:value xsi:type="xs:double">
7.48
</map:value>
</map:entry>
with a corresponding value per key.
I want to recreate the doc based on decomposing the key
"category:::variable:::segment" to the structure above and add the map:value as an element value.
Question: What is the best way to build the document? Do I create a node, fill it with elelments and then instert it in ML or do I make an empty doc and add stuff as I go along (seems not so fast)
hugo
For a single-threaded approach, I would start by creating a map where the keys are six-digit codes and the values are the segments. You might be able to do this using http://docs.marklogic.com/cts:value-co-occurrences with the map option. Depending on the details that might mean one cts:value-co-occurrences call per category or variable or segment. I'm being vague because I don't see how those fit together in your use-case.
Once you have your six-digit map(s), use them to build four-digit map(s). That means looping through the six-digit keys and pushing new values into the four-digit map(s). Then you're ready to serialize the four-digit map(s) to new XML documents. That should be easy if the structure of your four-digit map entries is close to the final XML format. Write a simple XQuery function that takes a four-digit map and a code, and inserts the new document.
You might also think about concurrency using the Task Server. You could read all the six-digit codes from a lexicon, starting tasks that each process the six-digit codes corresponding to N four-digit codes. Done correctly that should be faster than one giant map. It's important to avoid any overlap in four-digit codes between the tasks, so that you don't have lock contention when inserting the new four-digit documents.

Related Links

Querying customDimensions in Google Bigquery
NHibernate - QueryOver criteria appearing in Where instead in Having clause, error
Howto aggregate on full data set in Spring Batch jobs?
Creating an aggregate function fails
Necessity of declaration of function in c and cpp
Calculate window average in tableau
Complexity asymptotic relation (theta, Big O, little o, Big Omega, little omega) between functions
What are TOP_COUNT and TOP_MAXCOUNT in BigQuery?
marklogic need advise for approach to aggregate documents
multiply(num) aggregate function in postgresql
Binding the Result of an Aggregate Function to a Projected Variable
tableau aggregate data based on dimension
where clause and aggregate functions
Is there such a thing as a join() aggregate function that concatentates field values at a specific character?
Aggregation of an expression in Django query spanning multiple tables
How to aggregate / roll up percentile measures

Categories

HOME
postgresql
ssms
botframework
ftp
cors
abc
spss
where
artifactory
google-docs-api
autocad
clip-path
promotions
angular-universal
velocity
watson-dialog
fastreport
yui
loss
simd
powerpivot
scotty
vala
lotusscript
apdu
aspxgridview
datanucleus
hierarchy
lines
jmsserializerbundle
directions
fastadapter
angularjs-resource
nscollectionview
idea-plugin
directadmin
vpython
firefox-developer-edition
qtwebkit
serverless-architecture
easyquery
runtime-permissions
dna-sequence
jcifs
bioperl
nservicebus6
codesniffer
lattice
racket-student-languages
strtol
sprockets
weblogic-maven-plugin
kik
razorengine
salesforce-service-cloud
google-finance-api
clear
newlib
serial-communication
pushwoosh
vensim
tokudb
chrome-mobile
alpha-beta-pruning
gnip
disabled-control
dateadd
rssi
grunt-contrib-imagemin
fade
infomaker
rfc822
nsscrollview
datasheet
wxperl
unordered-set
facebook-winjs-sdk
yii-cactiverecord
log-shipping
youtube.net-api
lightopenid
mdd
retrospectiva
jettison
stackless

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App