{"id":4845,"date":"2019-07-23T16:30:14","date_gmt":"2019-07-23T15:30:14","guid":{"rendered":"https:\/\/www.blopig.com\/blog\/?p=4845"},"modified":"2019-07-30T15:53:05","modified_gmt":"2019-07-30T14:53:05","slug":"searching-through-large-databases-with-bloom-filter","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2019\/07\/searching-through-large-databases-with-bloom-filter\/","title":{"rendered":"Searching through large databases with bloom filter"},"content":{"rendered":"\n<p>Searching through large databases is often a linear time problem. Here I compare the performance of applying a bloom filter and using the regular std::find command in C++:Codes are from:&nbsp;<a href=\"https:\/\/codereview.stackexchange.com\/questions\/179135\/bloom-filter-implementation-in-c\">https:\/\/codereview.stackexchange.com\/questions\/179135\/bloom-filter-implementation-in-c<\/a><code><\/code><\/p>\n\n\n\n<!--more-->\n\n\n\n<pre class=\"wp-block-preformatted\">Simple example\nStrings in the database:\nHello World!\nsweet potato\nC++\nalpha\nbeta\ngamma\nNumber of strings in the database: 6\nNumber of strings in the test set: 10\nStrings to look for:\nBloom filter takes (ms):51\nRegular search takes takes (ms):39\n<\/pre>\n\n\n\n<p><code><\/code><\/p>\n\n\n\n<p> <br>Then I tested it on Ligity-like hashes in the form of AAA000, where AAA were the three vertex types and 000 were the three edge length indices. I generate a list of random hashes in this configuration and varied the number of prefixes &#8211; where the first n characters are the same.&nbsp; <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"115\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2019\/07\/plot.png?resize=625%2C115&#038;ssl=1\" alt=\"\" class=\"wp-image-4846\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2019\/07\/plot.png?resize=1024%2C189&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2019\/07\/plot.png?resize=300%2C55&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2019\/07\/plot.png?resize=768%2C142&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2019\/07\/plot.png?resize=624%2C115&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2019\/07\/plot.png?w=1250&amp;ssl=1 1250w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2019\/07\/plot.png?w=1875&amp;ssl=1 1875w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>Applying it to the Ligity-like hashes reinforces two ideas: a) having a good hashing function that adequately separate out items are important b) bloom filter requires the prior overhead in encoding the hashes, but this overhead helps save a significant amount of time when a large number of searches were carried out.&nbsp;<br><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Searching through large databases is often a linear time problem. Here I compare the performance of applying a bloom filter and using the regular std::find command in C++:Codes are from:&nbsp;https:\/\/codereview.stackexchange.com\/questions\/179135\/bloom-filter-implementation-in-c<\/p>\n","protected":false},"author":49,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[29],"tags":[],"ppma_author":[533],"class_list":["post-4845","post","type-post","status-publish","format-standard","hentry","category-code"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":533,"user_id":49,"is_guest":0,"slug":"catherine","display_name":"Wing Ki (Catherine) Wong","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/c620f4eb25c8d5c8efdae64e95c4ec436da2bc2d6654ab7085ad1a822e3b4341?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/4845","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/49"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=4845"}],"version-history":[{"count":3,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/4845\/revisions"}],"predecessor-version":[{"id":4880,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/4845\/revisions\/4880"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=4845"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=4845"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=4845"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=4845"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}