{"id":416,"date":"2003-12-05T00:28:34","date_gmt":"2003-12-05T05:28:34","guid":{"rendered":"https:\/\/rojisan.com\/blog\/2003\/12\/privacy-and-data-mining\/"},"modified":"2003-12-05T00:28:34","modified_gmt":"2003-12-05T05:28:34","slug":"privacy-and-data-mining","status":"publish","type":"post","link":"https:\/\/rojisan.com\/blog\/2003\/12\/privacy-and-data-mining\/","title":{"rendered":"privacy and data mining"},"content":{"rendered":"<p><a href=\"http:\/\/www.almaden.ibm.com\/cs\/people\/ragrawal\/\" class=\"broken_link\">rakesh agrawal<\/a> stole my stuff!<\/p>\n<p>ok,  not really.  but i think i can see this person in my future.<\/p>\n<p>i&#8217;ve been working on a few things that involve the preservation of privacy in a large collection of data that can still be analyzed.  i ran through a couple ideas &#8211; generally:<\/p>\n<p><strong>hashing<\/strong>,  where the data is manipulated permanently before it&#8217;s analyzed (but that can destroy relevant information)<br \/>\n<strong>black-box queries<\/strong>, where you can ask a question, but you don&#8217;t get to see the raw data (but that can put a real screw to reproducing results, and so confirming valid work)<br \/>\n<strong>compartmentalization<\/strong>, where only data important to the analysis is made available (but that means multiple analyses might piece together private information)<br \/>\n<strong>randomization<\/strong>, where the data is randomized as a set, and statistically-relevant results are still valid (but this generally means a big raw data set)<\/p>\n<p>well, i didn&#8217;t really come to any conclusions, except that any of these methods might be useful depending on the circumstances.  in the particular circumstances i&#8217;m thinking on, the randomization approach seems the most useful.<\/p>\n<p>&#8230;off into the wild internet i go and amazingly enough, it&#8217;s been done.<\/p>\n<p><a href=\"http:\/\/www.abc.net.au\/rn\/science\/buzz\/stories\/s803493.htm\">here<\/a> is an interview version.  <a href=\"http:\/\/www.acm.org\/sigmod\/record\/issues\/0309\/D15.rakesh-final-final.pdf\">this<\/a>  [pdf] is one from acm.  (see his <a href=\"http:\/\/www.almaden.ibm.com\/cs\/people\/ragrawal\/\" class=\"broken_link\">page<\/a> for papers).<\/p>\n<p>the combination of protecting individual privacy and building an enormous database that can be combed (well, raked) for trends and historic comparisons is critical to improving my diet.  i&#8217;m glad i don&#8217;t have to invent this wheel.<\/p>\n<p>so this is all old news to me &#8211; why bring it up?  rakesh was recently <a href=\"http:\/\/www.marketwire.com\/mw\/release_html_b1?release_id=59759\">honored<\/a> by scientific american as one of the top 50 contributors and contributions to science and technology.  so he&#8217;s going to be a really popular guy now.<\/p>\n<p>i just thought i&#8217;d get a number now&#8230; save me a place!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>rakesh agrawal stole my stuff! ok, not really. but i think i can see this person in my future. i&#8217;ve been working on a few things that involve the preservation of privacy in a large collection of data that can still be analyzed. i ran through a couple ideas &#8211; generally: hashing, where the data [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"https:\/\/rojisan.com\/blog\/wp-json\/wp\/v2\/posts\/416"}],"collection":[{"href":"https:\/\/rojisan.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rojisan.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rojisan.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/rojisan.com\/blog\/wp-json\/wp\/v2\/comments?post=416"}],"version-history":[{"count":0,"href":"https:\/\/rojisan.com\/blog\/wp-json\/wp\/v2\/posts\/416\/revisions"}],"wp:attachment":[{"href":"https:\/\/rojisan.com\/blog\/wp-json\/wp\/v2\/media?parent=416"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rojisan.com\/blog\/wp-json\/wp\/v2\/categories?post=416"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rojisan.com\/blog\/wp-json\/wp\/v2\/tags?post=416"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}