{"id":14028,"date":"2026-03-23T15:42:08","date_gmt":"2026-03-23T15:42:08","guid":{"rendered":"https:\/\/www.blopig.com\/blog\/?p=14028"},"modified":"2026-03-25T10:23:38","modified_gmt":"2026-03-25T10:23:38","slug":"misconduct-bias-or-benign-a-case-of-missing-angstroms","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2026\/03\/misconduct-bias-or-benign-a-case-of-missing-angstroms\/","title":{"rendered":"Misconduct, Bias or Benign? A Case of Missing \u00c5ngstr\u00f6ms"},"content":{"rendered":"\n<p class=\"\"><strong>An \u00c5ngstr\u00f6m<\/strong><\/p>\n\n\n\n<p class=\"\">An \u00c5ngstr\u00f6m (\u00c5) is a unit of length equal to 10<sup>\u221210 <\/sup>metres; one ten-billionth of a metre. It sits at a comfortable scale for the atomic world, with the diameter of a hydrogen atom, the length of a chemical bond, all measured in \u00c5ngstr\u00f6m.<\/p>\n\n\n\n<p class=\"\">It is not an International System of Units (Syst\u00e8me International d&#8217;Unit\u00e9s) &#8220;SI&#8221; unit. In fact, it has been formally deprecated in favour of the nanometre (1 \u00c5 = 0.1 nm), and standards bodies such as NIST and the BIPM discourage its use. Yet, in structural biology and chemistry, crystallography, and materials science, the \u00c5ngstr\u00f6m persists. I would say, partly out of stubbornness, but mostly out of convenience. Saying a protein structure was solved at 2.1 \u00c5 feels natural in a way that 0.21 nm does not.<\/p>\n\n\n\n<p class=\"\">So we keep using it. And because we keep using it, we inherit its quirks and history.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p class=\"\"><strong>Preamble: Missing Z-values in Medical Research<\/strong><\/p>\n\n\n\n<p class=\"\">Before I share with you the particular quirk I wrote this blog about, let us discuss a different but hopefully apparently similar anomaly.<\/p>\n\n\n\n<p class=\"\">In medical statistics, Barnett &amp; Wren (2019) examined decades of Medline papers and noticed something odd: missing Z-values. More precisely, there was a suspicious scarcity of results just shy of conventional significance thresholds.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/Z_plot.png?ssl=1\"><img decoding=\"async\" width=\"2400\" height=\"1800\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/Z_plot.png?fit=625%2C469&amp;ssl=1\" alt=\"\" class=\"wp-image-14030\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/Z_plot.png?w=2400&amp;ssl=1 2400w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/Z_plot.png?resize=300%2C225&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/Z_plot.png?resize=1024%2C768&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/Z_plot.png?resize=768%2C576&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/Z_plot.png?resize=1536%2C1152&amp;ssl=1 1536w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/Z_plot.png?resize=2048%2C1536&amp;ssl=1 2048w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/Z_plot.png?resize=624%2C468&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/Z_plot.png?w=1250&amp;ssl=1 1250w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/Z_plot.png?w=1875&amp;ssl=1 1875w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n\n\n\n<p class=\"\">This is the statistical analogue of a cliff edge: results pile up just beyond p &lt; 0.05, and vanish tangentially at p <math><semantics><mo lspace=\"0em\" rspace=\"0em\">\u2265<\/mo><annotation encoding=\"application\/x-tex\">\\geq<\/annotation><\/semantics><\/math> 0.05 (corresponding to z-values of +\/- 1.96). You have probably seen this before, a critical value of 0.05 is by and large the default of statistical thresholds, in pretty much every major scientific field.<\/p>\n\n\n\n<p class=\"\">Two well known mechanisms explain this:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li class=\"\"><strong>Publication bias:<\/strong> non-significant results are less likely to be published<\/li>\n\n\n\n<li class=\"\"><strong>P-hacking \/ &#8220;researcher degrees of freedom&#8221;: <\/strong>analyses are nudged until they cross significance thresholds<\/li>\n<\/ol>\n\n\n\n<p class=\"\">van Zwet &amp; Cator (2021) describe this as the &#8220;significance filter&#8221; where only results that pass a threshold survive to be seen. The consequence is a distorted distribution: not a smooth continuum, but one with suspicious gaps and spikes. This raises the broader questions:<\/p>\n\n\n\n<p class=\"\">When humans report numbers, do we see the underlying truth? Or just the thresholds we (and what we believe others) care about? Regardless, it does not help the overwhelming nature of scientific discovery being abundant with true positives, and very few negatives (at least in terms of significance).<\/p>\n\n\n\n<p class=\"\"><strong>The RCSB-PDB<\/strong><\/p>\n\n\n\n<p class=\"\">With that question in mind, let us turn to structural biology.<\/p>\n\n\n\n<p class=\"\">Using the <a href=\"https:\/\/www.blopig.com\/blog\/2025\/09\/exploring-the-protein-data-bank-programmatically\/\" data-type=\"link\" data-id=\"https:\/\/www.blopig.com\/blog\/2025\/09\/exploring-the-protein-data-bank-programmatically\/\">RCSB Protein Data Bank API<\/a>, I retrieved 199,761 protein structures solved by X-ray diffraction. For each, I extracted the reported refined resolution, the canonical measure of structural quality for the model, as reported by the authors who deposited the structural model.<\/p>\n\n\n\n<p class=\"\">At first glance, everything looks reassuringly normal.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram-scaled.png?ssl=1\"><img decoding=\"async\" width=\"2560\" height=\"2048\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram-scaled.png?fit=625%2C500&amp;ssl=1\" alt=\"\" class=\"wp-image-14032\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram-scaled.png?w=2560&amp;ssl=1 2560w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram-scaled.png?resize=300%2C240&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram-scaled.png?resize=1024%2C819&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram-scaled.png?resize=768%2C614&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram-scaled.png?resize=1536%2C1229&amp;ssl=1 1536w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram-scaled.png?resize=2048%2C1638&amp;ssl=1 2048w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram-scaled.png?resize=624%2C499&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram-scaled.png?w=1250&amp;ssl=1 1250w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram-scaled.png?w=1875&amp;ssl=1 1875w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"\">The mean resolution is <math><semantics><mo lspace=\"0em\" rspace=\"0em\">\u2248<\/mo><annotation encoding=\"application\/x-tex\">\\approx<\/annotation><\/semantics><\/math>2.11 \u00c5<\/li>\n\n\n\n<li class=\"\">The distribution is unimodal, centred around <math><semantics><mo lspace=\"0em\" rspace=\"0em\">\u2248<\/mo><annotation encoding=\"application\/x-tex\">\\approx<\/annotation><\/semantics><\/math>2 \u00c5<\/li>\n\n\n\n<li class=\"\">There is a long tail toward poorer resolutions<\/li>\n<\/ul>\n\n\n\n<p class=\"\">This aligns with intuition: 2.0 \u00c5 is often considered a &#8220;good&#8221; structure, high enough to resolve side chains reliably, low enough to be experimentally tractable. The 2.0 \u00c5 is very much analogous to a p &lt; 0.05, it is used not only in determining structure, but also in measuring algorithm performance in areas such as drug discovery where, much like the p-value, it is sometimes abused.<\/p>\n\n\n\n<p class=\"\"><strong>The Illusion of Smoothness<\/strong><\/p>\n\n\n\n<p class=\"\">When we bin the data coarsely (0.2 \u00c5 bins, Figure A), the distribution looks smooth, almost Gaussian-like around its peak. But smoothness is a function of resolution, not of the X-ray data in this case, but of the visualisation.<\/p>\n\n\n\n<p class=\"\"><strong>Zooming In<\/strong><\/p>\n\n\n\n<p class=\"\">Now we increase the granularity (Figure B): 0.01 \u00c5 bins, focusing on the 2.0\u20132.5 \u00c5 range. Do you see it too? The smooth distribution fractures.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram_zoom-scaled.png?ssl=1\"><img decoding=\"async\" width=\"2560\" height=\"2048\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram_zoom-scaled.png?fit=625%2C500&amp;ssl=1\" alt=\"\" class=\"wp-image-14034\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram_zoom-scaled.png?w=2560&amp;ssl=1 2560w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram_zoom-scaled.png?resize=300%2C240&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram_zoom-scaled.png?resize=1024%2C819&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram_zoom-scaled.png?resize=768%2C614&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram_zoom-scaled.png?resize=1536%2C1229&amp;ssl=1 1536w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram_zoom-scaled.png?resize=2048%2C1638&amp;ssl=1 2048w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram_zoom-scaled.png?resize=624%2C499&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram_zoom-scaled.png?w=1250&amp;ssl=1 1250w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/03\/resolution_histogram_zoom-scaled.png?w=1875&amp;ssl=1 1875w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n\n\n\n<p class=\"\">Instead of a continuous curve, we see distinct spikes:<br>2.00 \u00c5<br>2.05 \u00c5<br>2.10 \u00c5<br>2.15 \u00c5<br>2.20 \u00c5<br>\u2026 and so on<\/p>\n\n\n\n<p class=\"\"><strong>What\u2019s Going On?<\/strong><\/p>\n\n\n\n<p class=\"\">For me, several hypotheses present themselves:<br><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li class=\"\"><strong>Rounding and Reporting Conventions.<\/strong> Crystallographic refinement pipelines often produce resolutions with limited precision. Authors may: round to 2 decimal places, round to &#8220;nice&#8221; increments, report conservative values.<\/li>\n\n\n\n<li class=\"\"><strong>Software Defaults and Pipelines<\/strong>. If widely used tools output resolution in fixed increments, this introduces systematic quantisation across the entire field.<\/li>\n\n\n\n<li class=\"\"><strong>Psychological Thresholds<\/strong>. Just like p=0.05, structural biology has its own soft thresholds: &#8220;~2.0 \u00c5&#8221; = good, &#8220;&lt;2.5 \u00c5&#8221; = acceptable, &#8220;&lt;3.0 \u00c5&#8221; = usable. If a structure refines to 2.04 \u00c5, is it reported as 2.04, or nudged to 2.00 or 2.05? <span style=\"text-decoration: underline\">Even without misconduct, human preference for round numbers can shape distributions.<\/span><\/li>\n\n\n\n<li class=\"\"><strong>Selection and Filtering.<\/strong> Structures just above key thresholds may be: deprioritised for deposition, less likely to be written up, filtered during curation. This would mirror the significance filter in statistics.<\/li>\n<\/ol>\n\n\n\n<p class=\"\"><strong>Same, Same, but Different?<\/strong><\/p>\n\n\n\n<p class=\"\">Now when we reconsider the Z-value distribution from the medical literature. We see (well, I do): depletion, inflation just beyond thresholds (e.g. p values of 0.05, 0.01, versus 2.0 \u00c5) and asymmetry driven by selection.<\/p>\n\n\n\n<p class=\"\">So is it bias, and is it benign? There are strong benign explanations: instrument precision limits, software discretisation, historical reporting standards, &#8220;harmless&#8221; rounding and so on.<\/p>\n\n\n\n<p class=\"\">But regardless we end up with systematic artefacts, that affect meta-analyses, will ultimately influence ML models trained on structural data, and shape our intuition about what \u201ctypical\u201d data, thresholds\/cutoffs looks like.<\/p>\n\n\n\n<p class=\"\"><strong>The Broader Point<\/strong><\/p>\n\n\n\n<p class=\"\">This is not really about \u00c5ngstr\u00f6ms (well, it is). What I wanted to demonstrate was more about measurement, reporting, and the quiet ways human choices imprint themselves onto data. We like thresholds. We like round numbers. We like categories. We love a decision boundary. Nature does not.<\/p>\n\n\n\n<p class=\"\"><strong>Citations<\/strong><\/p>\n\n\n\n<p class=\"\">Barnett AG, Wren JD Examination of CIs in health and medical journals from 1976 to 2019: an observational study BMJ Open 2019;9:e032506. doi: 10.1136\/bmjopen-2019-032506 <\/p>\n\n\n\n<p class=\"\">vanZwet EW, Cator EA. The significance filter, the winner&#8217;s curse and the need to shrink. Statistica Neerlandica. 2021;75:437\u2013452. https:\/\/doi.org\/10.1111\/stan.12241<br><\/p>\n","protected":false},"excerpt":{"rendered":"<p>An \u00c5ngstr\u00f6m An \u00c5ngstr\u00f6m (\u00c5) is a unit of length equal to 10\u221210 metres; one ten-billionth of a metre. It sits at a comfortable scale for the atomic world, with the diameter of a hydrogen atom, the length of a chemical bond, all measured in \u00c5ngstr\u00f6m. It is not an International System of Units (Syst\u00e8me [&hellip;]<\/p>\n","protected":false},"author":148,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[123,361,621,341,228,278,265],"tags":[],"ppma_author":[904],"class_list":["post-14028","post","type-post","status-publish","format-standard","hentry","category-commentary","category-data-science","category-data-visualization","category-databases","category-protein-structure","category-statistics","category-x-ray-crystallography"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":904,"user_id":148,"is_guest":0,"slug":"hasson","display_name":"Alexander Hasson","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/a18b0a615703cb20a58475f18f331eebaef289fadb6f4794f53d0f5a15f464c4?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/14028","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/148"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=14028"}],"version-history":[{"count":3,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/14028\/revisions"}],"predecessor-version":[{"id":14037,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/14028\/revisions\/14037"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=14028"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=14028"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=14028"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=14028"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}