aggregate-functions


marklogic need advise for approach to aggregate documents


Hi MarkLogicians out there,
SEE BELOW FOR EDIT
I have the following challenge:
I have soci demographic zip code data, from a flat relational table.
According to good practice I have created one document per row. The doc(row) holds roughly 400 values structured in 7 categories of 40 variables, for each variable there are 4 to 7 segments.
<doc id=1011AB >
<cat>
<var>
<seg>
25
</seg>
</var>
</cat>
</doc>
There are 500.000 documents like these, we need to aggregate the 6 digit level to a higher zipcode level(4 digits) around 40.000 documents.
We have working code for aggregating one segment per document. Now I am looking for a solution to aggregate the 6 digit level to the 4 digit level. The aggregation basically is a calculation of weighted averages.
My question:
Is there an elegant why to take a 6 digit level document as a template and fill it out or do i need to build the 4 digit level doc from scratch?
=============== EDIT ===================
ok so now I have a map in which we created a joined key like
<map:map xmlns:map="http://marklogic.com/xdmp/map" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<map:entry key="Consumententypes (sub):::Type 6, gezin met jongste kind 6+::: gezin met jongste kind 6+">
<map:value xsi:type="xs:double">
0
</map:value>
</map:entry>
<map:entry key="Woning:::Woontype:::De Veelbelovende Starter">
<map:value xsi:type="xs:double">
7.48
</map:value>
</map:entry>
with a corresponding value per key.
I want to recreate the doc based on decomposing the key
"category:::variable:::segment" to the structure above and add the map:value as an element value.
Question: What is the best way to build the document? Do I create a node, fill it with elelments and then instert it in ML or do I make an empty doc and add stuff as I go along (seems not so fast)
hugo
For a single-threaded approach, I would start by creating a map where the keys are six-digit codes and the values are the segments. You might be able to do this using http://docs.marklogic.com/cts:value-co-occurrences with the map option. Depending on the details that might mean one cts:value-co-occurrences call per category or variable or segment. I'm being vague because I don't see how those fit together in your use-case.
Once you have your six-digit map(s), use them to build four-digit map(s). That means looping through the six-digit keys and pushing new values into the four-digit map(s). Then you're ready to serialize the four-digit map(s) to new XML documents. That should be easy if the structure of your four-digit map entries is close to the final XML format. Write a simple XQuery function that takes a four-digit map and a code, and inserts the new document.
You might also think about concurrency using the Task Server. You could read all the six-digit codes from a lexicon, starting tasks that each process the six-digit codes corresponding to N four-digit codes. Done correctly that should be faster than one giant map. It's important to avoid any overlap in four-digit codes between the tasks, so that you don't have lock contention when inserting the new four-digit documents.

Related Links

Howto aggregate on full data set in Spring Batch jobs?
Creating an aggregate function fails
Necessity of declaration of function in c and cpp
Calculate window average in tableau
Complexity asymptotic relation (theta, Big O, little o, Big Omega, little omega) between functions
What are TOP_COUNT and TOP_MAXCOUNT in BigQuery?
marklogic need advise for approach to aggregate documents
multiply(num) aggregate function in postgresql
Binding the Result of an Aggregate Function to a Projected Variable
tableau aggregate data based on dimension
where clause and aggregate functions
Is there such a thing as a join() aggregate function that concatentates field values at a specific character?
Aggregation of an expression in Django query spanning multiple tables
How to aggregate / roll up percentile measures
A peer-to-peer and privacy-aware data mining/aggregation algorithm: is it possible?
Multiple aggregates in SPARQL

Categories

HOME
qlikview
azure-documentdb
ftp
sql-server-2014
microcontroller
postsharp
download
flowtype
symfony-3.2
qt4
revit-api
ejb-3.0
aws-sdk-cpp
xmlhttprequest
expo
watson-dialog
lambda-calculus
geography
react-jsx
datazen-server
apdu
continuous-fourier
windows-mobile
android-cursorloader
adobe-captivate
recreate
semantic-ui-react
lines
login-required
powerbi-embedded
pymunk
javafxports
abide
rtsp
libconfig
office-interop
keychain
jvisualvm
quartz
compiler-warnings
superclass
scrapinghub
html-entities
denodo
django-tables2
runtime-permissions
gige-sdk
photoshop-script
jenkins-docker
boost-regex
aot
senti-wordnet
multi-targeting
totalview
firebase-polymer
foreground
jmh
ideavim
abstract
skbio
bluetooth-lowenergy-4.2
gtkwave
serial-communication
eula
perfect-scrollbar
jsonmodel
javadb
api-eveonline
livescript
disparity-mapping
new-operator
elixir-framework
xtify
misfire-instruction
kendo-window
google-checkout
algol
shapado
blackberry-eclipse-plugin
github-archive
reflexil
idtabs
yii-cactiverecord
mcpd
gmagick
method-call
nbehave
uccapi
konsole
projectgen
cons

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App