{"id":12793,"date":"2025-09-03T08:17:48","date_gmt":"2025-09-03T07:17:48","guid":{"rendered":"https:\/\/www.blopig.com\/blog\/?p=12793"},"modified":"2025-09-03T08:17:49","modified_gmt":"2025-09-03T07:17:49","slug":"understand-large-codebases-faster-using-gitingest","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2025\/09\/understand-large-codebases-faster-using-gitingest\/","title":{"rendered":"Understand Large Codebases Faster Using GitIngest"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Often as researchers we have to deal with large and ugly codebases &#8211; this is not new, I know. Alas, fear not, now we have large language models (LLMs) like ChatGPT and friends which make things a little faster! In this blogpost I will show you how to use <a href=\"https:\/\/gitingest.com\/\">GitIngest<\/a> to do this <em>even<\/em> faster using your favourite LLM. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">No more copy pasting files individually or writing a paragraph explaining the directory structure, or even worse, relying on an LLM to use web search to find the codebase. As the codebase grows, the unreliability of these methods does too. GitIngest makes any &#8220;whole&#8221; codebase, prompt friendly &#8211; one prompt will be all you need!<\/p>\n\n\n\n<!--more-->\n\n\n\n<p class=\"wp-block-paragraph\">Simply take your favourite Github repo URL (publicly available, ideally) and replace &#8220;hub&#8221; with &#8220;ingest&#8221;. See the example below.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">On Gitingest, you can access all code in text as well as the directory structure of the repo. Take only the structure, or take parts of the code and feed into your favourite LLM with your questions or things you&#8217;d like to understand better.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">As a bonus, you get the token count and some extra information!<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I like Gemini 2.5 Pro for this task as it has a context window of 1 million tokens.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Happy coding!<br><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># your favourite Github URL<br><br>https:\/\/github.com\/google-research\/google-research\/tree\/master\/mol_dqn<br><br># replace \"hub\" with \"ingest\"<br><br>https:\/\/gitingest.com\/google-research\/google-research\/tree\/master\/mol_dqn<br><br># example number of tokens if you feed all code into an LLM is also given - might be important for a model like Claude.<br><br>Estimated tokens: 133.7k<br><br># example directory structure output (i won't paste the code output or i'll break the blog)<br><br>Directory structure:<br>\u2514\u2500\u2500 mol_dqn\/<br>    \u251c\u2500\u2500 README.md<br>    \u251c\u2500\u2500 requirements.txt<br>    \u251c\u2500\u2500 chemgraph\/<br>    \u2502   \u251c\u2500\u2500 __init__.py<br>    \u2502   \u251c\u2500\u2500 all_800_mols.json<br>    \u2502   \u251c\u2500\u2500 multi_obj_opt.py<br>    \u2502   \u251c\u2500\u2500 multi_obj_opt_test.py<br>    \u2502   \u251c\u2500\u2500 optimize_logp.py<br>    \u2502   \u251c\u2500\u2500 optimize_logp_of_800_molecules.py<br>    \u2502   \u251c\u2500\u2500 optimize_logp_of_800_molecules_test.py<br>    \u2502   \u251c\u2500\u2500 optimize_qed.py<br>    \u2502   \u251c\u2500\u2500 optimize_qed_test.py<br>    \u2502   \u251c\u2500\u2500 target_sas.py<br>    \u2502   \u251c\u2500\u2500 target_sas_eval.ipynb<br>    \u2502   \u251c\u2500\u2500 target_sas_test.py<br>    \u2502   \u251c\u2500\u2500 configs\/<br>    \u2502   \u2502   \u251c\u2500\u2500 bootstrap_dqn.json<br>    \u2502   \u2502   \u251c\u2500\u2500 bootstrap_dqn_opt_800.json<br>    \u2502   \u2502   \u251c\u2500\u2500 bootstrap_dqn_step1.json<br>    \u2502   \u2502   \u251c\u2500\u2500 bootstrap_dqn_step2.json<br>    \u2502   \u2502   \u251c\u2500\u2500 multi_obj_dqn.json<br>    \u2502   \u2502   \u251c\u2500\u2500 naive_dqn.json<br>    \u2502   \u2502   \u251c\u2500\u2500 naive_dqn_opt_800.json<br>    \u2502   \u2502   \u2514\u2500\u2500 target_sas.json<br>    \u2502   \u2514\u2500\u2500 dqn\/<br>    \u2502       \u251c\u2500\u2500 __init__.py<br>    \u2502       \u251c\u2500\u2500 deep_q_networks.py<br>    \u2502       \u251c\u2500\u2500 deep_q_networks_test.py<br>    \u2502       \u251c\u2500\u2500 molecules.py<br>    \u2502       \u251c\u2500\u2500 molecules_test.py<br>    \u2502       \u251c\u2500\u2500 run_dqn.py<br>    \u2502       \u251c\u2500\u2500 run_dqn_test.py<br>    \u2502       \u251c\u2500\u2500 py\/<br>    \u2502       \u2502   \u251c\u2500\u2500 __init__.py<br>    \u2502       \u2502   \u251c\u2500\u2500 molecules.py<br>    \u2502       \u2502   \u2514\u2500\u2500 molecules_test.py<br>    \u2502       \u2514\u2500\u2500 tensorflow_core\/<br>    \u2502           \u251c\u2500\u2500 __init__.py<br>    \u2502           \u2514\u2500\u2500 core.py<br>    \u251c\u2500\u2500 experimental\/<br>    \u2502   \u251c\u2500\u2500 deep_q_networks_noise.py<br>    \u2502   \u251c\u2500\u2500 eval_800_mols.py<br>    \u2502   \u251c\u2500\u2500 max_qed_with_sim.py<br>    \u2502   \u251c\u2500\u2500 multi_obj.py<br>    \u2502   \u251c\u2500\u2500 multi_obj_gen.py<br>    \u2502   \u251c\u2500\u2500 multi_obj_opt.py<br>    \u2502   \u251c\u2500\u2500 optimize_logp.py<br>    \u2502   \u251c\u2500\u2500 optimize_qed.py<br>    \u2502   \u251c\u2500\u2500 optimize_qed_final_reward.py<br>    \u2502   \u251c\u2500\u2500 optimize_qed_max_steps.py<br>    \u2502   \u251c\u2500\u2500 optimize_qed_noise.py<br>    \u2502   \u251c\u2500\u2500 optimize_qed_t.py<br>    \u2502   \u251c\u2500\u2500 optimize_weight_noise.py<br>    \u2502   \u2514\u2500\u2500 target_logp.py<br>    \u2514\u2500\u2500 plot\/<br>        \u251c\u2500\u2500 drug_20_smiles.json<br>        \u251c\u2500\u2500 episode_length_qed.json<br>        \u251c\u2500\u2500 plot.py<br>        \u251c\u2500\u2500 q_values_20.json<br>        \u2514\u2500\u2500 target_sas_results.csv<br><br><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Often as researchers we have to deal with large and ugly codebases &#8211; this is not new, I know. Alas, fear not, now we have large language models (LLMs) like ChatGPT and friends which make things a little faster! In this blogpost I will show you how to use GitIngest to do this even faster [&hellip;]<\/p>\n","protected":false},"author":128,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[633,29],"tags":[726,196,798,806,871],"ppma_author":[802],"class_list":["post-12793","post","type-post","status-publish","format-standard","hentry","category-ai","category-code","tag-chat-gpt","tag-github","tag-llms","tag-prompt-engineering","tag-research-skills"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":802,"user_id":128,"is_guest":0,"slug":"sanaz","display_name":"Sanaz Kazeminia","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/d7ee2fbf2cb52aaa1856ad4e395733a6a561811dad16c2ae3b60b3b8d5f6c68c?s=96&d=mm&r=g","author_category":"","user_url":"","last_name":"Kazeminia","first_name":"Sanaz","job_title":"","description":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/12793","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/128"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=12793"}],"version-history":[{"count":5,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/12793\/revisions"}],"predecessor-version":[{"id":12963,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/12793\/revisions\/12963"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=12793"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=12793"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=12793"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=12793"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}