aggregate-functions


A peer-to-peer and privacy-aware data mining/aggregation algorithm: is it possible?


Suppose I have a network of N nodes, each with a unique identity (e.g. public key) communicating with a central-server-less protocol (e.g. DHT, Kad). Each node stores a variable V. With reference to e-voting as an easy example, that variable could be the name of a candidate.
Now I want to execute an "aggregation" function on all V variables available in the network. With reference to e-voting example, I want to count votes.
My question is completely theoretical (I have to prove a statement, details at the end of the question), so please don't focus on the e-voting and all of its security aspects. Do I have to say it again? Don't answer me that "a node may have any number identities by generating more keys", "IPs can be traced back" etc. because that's another matter.
Let's see the distributed aggregation only from the privacy point of view.
THE question
Is it possible, in a general case, for a node to compute a function of variables stored at other nodes without getting their value associated to the node's identity? Did researchers design such a privacy-aware distributed algorithm?
I'm only dealing with privacy aspects, not general security!
Current thoughts
My current answer is no, so I say that a central server, obtaining all Vs and processes them without storing, is necessary and there are more legal than technical means to assure that no individual node's data is either stored or retransmitted by the central server. I'm asking to prove that my previous statement is false :)
In the e-voting example, I think it's impossible to count how many people voted for Alice and Bob without asking all the nodes, one by one "Hey, who do you vote for?"
Real case
I'm doing research in the Personal Data Store field. Suppose you store your call log in the PDS and somebody wants to find statistical values about the phone calls (i.e. mean duration, number of calls per day, variance, st-dev) without being revealed neither aggregated nor punctual data about an individual (that is, nobody must know neither whom do I call, nor my own mean call duration).
If a trusted broker exists, and everybody trusts it, that node can expose a double getMeanCallDuration() API that first invokes CallRecord[] getCalls() on every PDS in the network and then operates statistics on all rows. Without the central trusted broker, each PDS exposing double getMyMeanCallDuration() isn't statistically usable (the mean of the means shouldn't be the mean of all...) and most importantly reveals the identity of the single user.
Yes, it is possible. There is work that actually answers your question solving the problem, given some assumptions. Check the following paper: Privacy, efficiency & fault tolerance in aggregate computations on massive star networks.
You can do some computation (for example summing) of a group of nodes at another node without having the participants nodes to reveal any data between themselves and not even the node that is computing. After the computation, everyone learns the result (but no one learns any individual data besides their own which they knew already anyways). The paper describes the protocol and proves its security (and the protocol itself gives you the privacy level I just described).
As for protecting the identity of the nodes to unlink their value from their identity, that would be another problem. You could use anonymous credentials (check this: https://idemix.wordpress.com/2009/08/18/quick-intro-to-credentials/) or something alike to show that you are who you are without revealing your identity (in a distributed scenario).
The catch of this protocol is that you need a semi-trusted node to do the computation. A fully distributed protocol (for example, in a P2P network scenario) is not that easy though. Not because of a lack of a storage (you can have a DHT, for example) but rather you need to replace that trusted or semi-trusted node by the network, and that is when you find your issues, who does it? Why that one and not another one? And what if there is a collusion? Etc...
How about when each node publishes two sets of data x and y, such that
x - y = v
Assuming that I can emit x and y independently, you can correctly compute the overall mean and sum, while every single message is largely worthless.
So for the voting example and candidates X, Y, Z, I might have one identity publishing the vote
+2 -1 +3
and my second identity publishes the vote:
-2 +2 -3
But of course you cannot verify that I didn't vote multiple times anymore.

Related Links

Creating an aggregate function fails
Necessity of declaration of function in c and cpp
Calculate window average in tableau
Complexity asymptotic relation (theta, Big O, little o, Big Omega, little omega) between functions
What are TOP_COUNT and TOP_MAXCOUNT in BigQuery?
marklogic need advise for approach to aggregate documents
multiply(num) aggregate function in postgresql
Binding the Result of an Aggregate Function to a Projected Variable
tableau aggregate data based on dimension
where clause and aggregate functions
Is there such a thing as a join() aggregate function that concatentates field values at a specific character?
Aggregation of an expression in Django query spanning multiple tables
How to aggregate / roll up percentile measures
A peer-to-peer and privacy-aware data mining/aggregation algorithm: is it possible?
Multiple aggregates in SPARQL
SAP BO XI Desktop Intelligence Aggregate Calculations

Categories

HOME
c#
classification
gulp
msbuild
botframework
stanford-nlp
app-store
mailchimp
rmarkdown
popup
orange
spring-webflow
single-page-application
openoffice-basic
bug-reporting
combinations
connection-pooling
wai-aria
aspxgridview
poisson
scrapy-spider
jira-zephyr
stack-trace
osgi-bundle
sphinx4
kundera
paper-trail-gem
andengine
slim-lang
rtos
headphones
minitest
dom4j
cache-control
respect-validation
lighting
form-fields
mapquest
libconfig
scalamock
jslider
contenteditable
qt-quick
android-doze-and-standby
spring-ide
easyquery
geminabox
import.io
evaluation
spring-cloud-bus
cancan
riemann
vertex
visualizer
jcifs
playframework-2.3
activesupport
htop
fckeditor
critical-section
petapoco
chain
robotc
catalyst
addressbook
consolidation
joystick
jira-agile
scriptengine
qtruby
hotkeys
avi
transaction-isolation
shtml
jmh
wizard
firefox-os
jsondoc
web-performance
hexagonal-tiles
dcg
pdcurses
transparentproxy
waterline
accpac
connection-timeout
backtrace
jsonschema2pojo
sat4j
disabled-control
django-tests
node-mongodb-native
php-amqplib
mahara
scidb
wordbreaker
dnsbl
quit
certificate-revocation
mvcmailer
eeprom
mscorlib
parallel-python
peoplepicker
lightopenid
rollover
iphone-maps
system-analysis
for-xml
helios
hp-trim
projectgen
wtsapi32
asp.net-1.1
stackless

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App