# AGENTS.md ## Purpose This repo is a research knowledge base for submodular-function-related literature. The goal is not only to store PDFs, but to turn each paper into: 1. one canonical paper note 2. reusable concept notes 3. updated synthesis notes for the relevant group Default expectation: - one reading pass should be as deep as possible - do not stop at a shallow intake if the paper is important enough to understand - capture algorithm ideas, proof ideas, and important caveats while reading This file is here so future Codex threads can continue the work immediately without rediscovering the workflow. ## Read This First Before doing any work, read these files in this order: 1. `README.md` 2. `inbox/intake-index.md` 3. `syntheses/submodular.md` 4. `syntheses/k-submodular.md` 5. `syntheses/dr-submodular.md` If the task is about one specific paper, also read that paper's `metadata.md` and `summary.md` first. If the task is about a draft inside `paper-drafts/`, also read: - `templates/paper-draft-structure.md` - that draft's `NOTE.MD` first if it exists ## Repository Model The repo is intentionally simple. ```text SubResearch/ |- paper-drafts/ |- inbox/ |- papers/ | |- submodular/ | |- k-submodular/ | |- dr-submodular/ | `- mixed/ |- concepts/ |- syntheses/ |- templates/ `- scripts/ ``` ### Meaning of each top-level folder - `inbox/` raw input area for new PDFs, TeX sources, appendices, and duplicates - `paper-drafts/` writing area for draft papers authored in this repo - `papers/` canonical per-paper folders - `concepts/` reusable knowledge extracted from papers - `syntheses/` running overview notes per main group - `templates/` note templates - `scripts/` helper script(s), especially `new-paper.ps1` ### Draft-note convention for `paper-drafts/` Project-wide draft structure is defined in: - `templates/paper-draft-structure.md` For each paper draft folder inside `paper-drafts/`, maintain a progress note in: - `NOTE.MD` This note should be treated as the first place to look for the current state of the draft. Typical uses of `NOTE.MD`: - current narrative / active theorem route - what sections are stable vs still exploratory - experiment status - open proof gaps - next writing tasks When working on a draft in `paper-drafts/`: 1. read `NOTE.MD` first if it exists 2. after meaningful draft changes, update `NOTE.MD` 3. use `NOTE.MD` to avoid rediscovering the same context in future threads ### Draft subsection convention for `paper-drafts/` When writing a new algorithm subsection or a new exploratory-method subsection inside `paper-drafts/`, prefer the following order and keep it consistent unless the user asks for a different structure: 1. main idea of the algorithm, briefly 2. pseudocode 3. short description of the main steps 4. new notation needed for the proof 5. proof sketch 6. lemmas 7. main theorem 8. parameter discussion with explicit approximation values when possible This convention is especially useful for exploratory branches: - it keeps the algorithmic story readable before all constants are finalized - it localizes missing proof ingredients clearly - it makes future Codex threads much faster to continue Treat `templates/paper-draft-structure.md` as the default architecture for all new draft folders and new technical sections. ## Core Workflow For every new paper: 1. inspect the file in `inbox/` 2. determine whether it is: - a new canonical paper - a duplicate of an existing canonical paper - a survey / handbook note instead of a primary research paper 3. choose one `primary group`: - `submodular` - `k-submodular` - `dr-submodular` - `mixed` 4. create or reuse a canonical folder: - `papers//YYYY-firstauthor-short-title/` 5. ensure that folder contains: - `paper.pdf` - `metadata.md` - `summary.md` - optional `source/` 6. read the paper carefully enough to capture: - the exact problem setting - the main theorem(s) - the algorithmic idea - the proof / analysis idea - key lemmas - assumptions, edge cases, and limitations 7. extract reusable content into `concepts/` 8. update the appropriate file in `syntheses/` 9. update `inbox/intake-index.md` 10. update `index.html` at the repo root ## Important Working Rules ### 1. Do not over-classify Do not create deep folder trees like `algorithms/constraints/guarantees/...`. This repo uses: - one primary group - secondary tags in `metadata.md` - concept extraction in `concepts/` - synthesis updates in `syntheses/` This is deliberate because many papers are cross-over papers. ### 2. One paper, one canonical note Each paper should have exactly one canonical folder in `papers/`. If the same paper appears multiple times in `inbox/`: - do not create multiple paper folders - record the duplicate mapping in `inbox/intake-index.md` - mention duplicate input files inside the canonical `metadata.md` when useful ### 3. Keep `inbox/` as raw intake Do not delete input files from `inbox/` unless the user explicitly asks for cleanup. The current convention is: - keep raw files in `inbox/` - copy the canonical file into `papers/.../paper.pdf` ### 3b. Keep `paper-drafts/` separate from literature storage Use `paper-drafts/` only for draft papers authored in this repo. - do not place canonical literature notes there - do not treat it as a replacement for `inbox/` or `papers/` - final canonical literature storage still belongs in `papers/` ### 4. Concepts should be reusable Only create a concept note if it is reusable across multiple papers or likely to matter later. Good concept-note candidates: - definitions - notation - standard lemmas - recurring techniques - relations between models Bad candidates: - one-off comments that only matter for a single paper summary ### 5. Read deeply on first contact Avoid the pattern: - read paper once shallowly - come back later to re-read the whole PDF for algorithm details - come back again for proof structure Preferred pattern: - when processing a paper for the first time, read it carefully enough to capture the full research value of that pass - write down the algorithm idea, not only the theorem statement - write down the proof strategy, not only the final ratio - record the important lemmas, what they are used for, and where the argument is delicate - record any assumptions, hidden constants, oracle assumptions, monotonicity assumptions, or normalization assumptions - note what is genuinely reusable later If the paper is very long, still do one serious canonical pass and leave explicit "need deeper proof check" notes only for the unresolved parts. ### 6. Synthesis notes are not paper dumps When updating `syntheses/*.md`, prefer: - milestones - recurring techniques - entry-point papers - open directions - cross-links between branches Do not just append paper names without structure. ### 7. Keep `index.html` in sync The repo root contains `index.html`. It is the fastest human-facing entry point for the collection. Whenever you: - add a new canonical paper - move a paper to another primary group - change an important status such as `processed-first-pass` -> `processed-deep` - add a concept or synthesis note that should be visible from the front page update `index.html` in the same task. At minimum, each canonical paper shown in `index.html` should expose links to: - `summary.md` - `metadata.md` - `paper.pdf` ## Current State The inbox has already been triaged once. Canonical mappings are recorded in: - `inbox/intake-index.md` Current canonical papers already created: - `papers/submodular/1970-edmonds-submodular-functions-matroids-polyhedra/` - `papers/submodular/1978-nemhauser-wolsey-fisher-maximizing-submodular-set-functions/` - `papers/submodular/1998-bar-ilan-kortsarz-peleg-generalized-submodular-cover/` - `papers/submodular/2000-schrijver-combinatorial-submodular-minimization/` - `papers/submodular/2007-iwata-submodular-function-minimization-survey/` - `papers/submodular/2007-mccormick-on-submodular-function-minimization/` - `papers/submodular/2013-iyer-bilmes-submodular-cover-knapsack-constraints/` - `papers/submodular/2014-badanidiyuru-et-al-streaming-submodular-maximization/` - `papers/submodular/2019-liu-et-al-greedy-strategies-survey/` - `papers/k-submodular/2014-ward-zivny-maximizing-k-submodular-functions/` - `papers/dr-submodular/2015-soma-yoshida-generalized-submodular-cover-integer-lattice/` Existing concept notes already created: - `concepts/submodularity.md` - `concepts/polymatroid.md` - `concepts/greedy-cardinality-approximation.md` - `concepts/submodular-function-minimization.md` - `concepts/k-submodularity.md` - `concepts/dr-submodularity-on-integer-lattice.md` ## Priority Papers Two papers are currently considered the most foundational originals in the repo: 1. `inbox/AN ANALYSIS OF APPROXIMATIONS FOR.pdf` 2. `inbox/submodular-functions-matroids-and-certain-polyhedra-53q1tge8w1.pdf` Their canonical folders are: - `papers/submodular/1978-nemhauser-wolsey-fisher-maximizing-submodular-set-functions/` - `papers/submodular/1970-edmonds-submodular-functions-matroids-polyhedra/` If a future thread needs to deepen the knowledge base, these are the best starting point. ## Metadata Conventions Each `metadata.md` should contain at least: - `Title` - `Authors` - `Year` - `Venue` - `Primary group` - `Secondary tags` - `Problem` - `Main guarantee` - `Key techniques` - `Status` - `Tags` - `Inbox source` Use `Status` to encode useful state such as: - `processed-deep` - `processed-first-pass` - `survey` - `foundational-original` - `venue-year-to-verify` ## Summary Conventions Each `summary.md` should aim to answer: 1. what problem does the paper study? 2. why does it matter? 3. what are the main results? 4. what is the core algorithmic idea? 5. what is the proof / analysis strategy? 6. what are the key techniques and lemmas? 7. what assumptions, caveats, or failure points matter? 8. what should be extracted to `concepts/`? 9. what should be updated in `syntheses/`? Default target from now on: - a deep canonical note - enough detail that a later thread usually does not need to re-read the whole PDF - summary should preserve theorem statements at a high level, but also the route to them For important papers, `summary.md` should include at least: - problem setup - main theorem(s) - algorithm outline - proof skeleton - delicate steps / caveats - reusable ingredients ## When Doing Deeper Work If the user asks for deeper reading of a paper: 1. read the existing `metadata.md` 2. read the existing `summary.md` 3. reopen the paper PDF 4. upgrade the note instead of creating parallel notes 5. enrich existing concept notes if possible 6. only create a new concept file when the idea is clearly distinct If a paper is only `processed-first-pass`, prioritize upgrading it to `processed-deep`. ## Duplicate Handling Known duplicates already identified: - `inbox/sub-modular.pdf` duplicate of the Nemhauser-Wolsey-Fisher paper - `inbox/submodular.pdf` duplicate of `inbox/submodular survey.pdf` Always record future duplicates in `inbox/intake-index.md`. ## Script Usage There is a helper script: - `scripts/new-paper.ps1` Use it when convenient, but note: - direct script execution may be blocked by PowerShell execution policy - if needed, call it with: `powershell -ExecutionPolicy Bypass -File .\scripts\new-paper.ps1 -Group -Slug ` ## Style Guidance For Future Codex Threads - Keep all text files ASCII if possible. - Prefer updating existing notes over creating overlapping files. - Be explicit when year / venue is uncertain. - If a paper is a survey, say so clearly. - If a file is a scan, reprint, or duplicate, say so clearly. - Keep the repo useful for retrieval, not just for archiving. - Record proof ideas in plain language, not only theorem labels. - Record algorithm intuition, not only pseudocode-level steps. - Record "what to be careful about" so future threads avoid rereading for the same subtlety. ## Good Next Steps If a future thread needs a high-value continuation, good options are: 1. deepen the two foundational originals 2. create comparison notes for: - cover lineage - greedy lineage - minimization lineage 3. verify uncertain venue/year metadata 4. process new PDFs added to `inbox/` After any of those changes, check whether `index.html` also needs an update.