# AGENTS.md

## Purpose

This repo is a research knowledge base for submodular-function-related literature.

The goal is not only to store PDFs, but to turn each paper into:

1. one canonical paper note
2. reusable concept notes
3. updated synthesis notes for the relevant group

Default expectation:

- one reading pass should be as deep as possible
- do not stop at a shallow intake if the paper is important enough to understand
- capture algorithm ideas, proof ideas, and important caveats while reading

This file is here so future Codex threads can continue the work immediately without rediscovering the workflow.

## Read This First

Before doing any work, read these files in this order:

1. `README.md`
2. `inbox/intake-index.md`
3. `syntheses/submodular.md`
4. `syntheses/k-submodular.md`
5. `syntheses/dr-submodular.md`

If the task is about one specific paper, also read that paper's `metadata.md` and `summary.md` first.

If the task is about a draft inside `paper-drafts/`, also read:

- `templates/paper-draft-structure.md`
- that draft's `NOTE.MD` first if it exists

## Repository Model

The repo is intentionally simple.

```text
SubResearch/
|- paper-drafts/
|- inbox/
|- papers/
|  |- submodular/
|  |- k-submodular/
|  |- dr-submodular/
|  `- mixed/
|- concepts/
|- syntheses/
|- templates/
`- scripts/
```

### Meaning of each top-level folder

- `inbox/`
  raw input area for new PDFs, TeX sources, appendices, and duplicates
- `paper-drafts/`
  writing area for draft papers authored in this repo
- `papers/`
  canonical per-paper folders
- `concepts/`
  reusable knowledge extracted from papers
- `syntheses/`
  running overview notes per main group
- `templates/`
  note templates
- `scripts/`
  helper script(s), especially `new-paper.ps1`

### Draft-note convention for `paper-drafts/`

Project-wide draft structure is defined in:

- `templates/paper-draft-structure.md`

For each paper draft folder inside `paper-drafts/`, maintain a progress note in:

- `NOTE.MD`

This note should be treated as the first place to look for the current state of the draft.

Typical uses of `NOTE.MD`:

- current narrative / active theorem route
- what sections are stable vs still exploratory
- experiment status
- open proof gaps
- next writing tasks

When working on a draft in `paper-drafts/`:

1. read `NOTE.MD` first if it exists
2. after meaningful draft changes, update `NOTE.MD`
3. use `NOTE.MD` to avoid rediscovering the same context in future threads

### Draft subsection convention for `paper-drafts/`

When writing a new algorithm subsection or a new exploratory-method subsection inside `paper-drafts/`, prefer the following order and keep it consistent unless the user asks for a different structure:

1. main idea of the algorithm, briefly
2. pseudocode
3. short description of the main steps
4. new notation needed for the proof
5. proof sketch
6. lemmas
7. main theorem
8. parameter discussion with explicit approximation values when possible

This convention is especially useful for exploratory branches:

- it keeps the algorithmic story readable before all constants are finalized
- it localizes missing proof ingredients clearly
- it makes future Codex threads much faster to continue

Treat `templates/paper-draft-structure.md` as the default architecture for all new draft folders and new technical sections.

## Core Workflow

For every new paper:

1. inspect the file in `inbox/`
2. determine whether it is:
   - a new canonical paper
   - a duplicate of an existing canonical paper
   - a survey / handbook note instead of a primary research paper
3. choose one `primary group`:
   - `submodular`
   - `k-submodular`
   - `dr-submodular`
   - `mixed`
4. create or reuse a canonical folder:
   - `papers/<group>/YYYY-firstauthor-short-title/`
5. ensure that folder contains:
   - `paper.pdf`
   - `metadata.md`
   - `summary.md`
   - optional `source/`
6. read the paper carefully enough to capture:
   - the exact problem setting
   - the main theorem(s)
   - the algorithmic idea
   - the proof / analysis idea
   - key lemmas
   - assumptions, edge cases, and limitations
7. extract reusable content into `concepts/`
8. update the appropriate file in `syntheses/`
9. update `inbox/intake-index.md`
10. update `index.html` at the repo root

## Important Working Rules

### 1. Do not over-classify

Do not create deep folder trees like `algorithms/constraints/guarantees/...`.

This repo uses:

- one primary group
- secondary tags in `metadata.md`
- concept extraction in `concepts/`
- synthesis updates in `syntheses/`

This is deliberate because many papers are cross-over papers.

### 2. One paper, one canonical note

Each paper should have exactly one canonical folder in `papers/`.

If the same paper appears multiple times in `inbox/`:

- do not create multiple paper folders
- record the duplicate mapping in `inbox/intake-index.md`
- mention duplicate input files inside the canonical `metadata.md` when useful

### 3. Keep `inbox/` as raw intake

Do not delete input files from `inbox/` unless the user explicitly asks for cleanup.

The current convention is:

- keep raw files in `inbox/`
- copy the canonical file into `papers/.../paper.pdf`

### 3b. Keep `paper-drafts/` separate from literature storage

Use `paper-drafts/` only for draft papers authored in this repo.

- do not place canonical literature notes there
- do not treat it as a replacement for `inbox/` or `papers/`
- final canonical literature storage still belongs in `papers/`

### 4. Concepts should be reusable

Only create a concept note if it is reusable across multiple papers or likely to matter later.

Good concept-note candidates:

- definitions
- notation
- standard lemmas
- recurring techniques
- relations between models

Bad candidates:

- one-off comments that only matter for a single paper summary

### 5. Read deeply on first contact

Avoid the pattern:

- read paper once shallowly
- come back later to re-read the whole PDF for algorithm details
- come back again for proof structure

Preferred pattern:

- when processing a paper for the first time, read it carefully enough to capture the full research value of that pass
- write down the algorithm idea, not only the theorem statement
- write down the proof strategy, not only the final ratio
- record the important lemmas, what they are used for, and where the argument is delicate
- record any assumptions, hidden constants, oracle assumptions, monotonicity assumptions, or normalization assumptions
- note what is genuinely reusable later

If the paper is very long, still do one serious canonical pass and leave explicit "need deeper proof check" notes only for the unresolved parts.

### 6. Synthesis notes are not paper dumps

When updating `syntheses/*.md`, prefer:

- milestones
- recurring techniques
- entry-point papers
- open directions
- cross-links between branches

Do not just append paper names without structure.

### 7. Keep `index.html` in sync

The repo root contains `index.html`.

It is the fastest human-facing entry point for the collection.

Whenever you:

- add a new canonical paper
- move a paper to another primary group
- change an important status such as `processed-first-pass` -> `processed-deep`
- add a concept or synthesis note that should be visible from the front page

update `index.html` in the same task.

At minimum, each canonical paper shown in `index.html` should expose links to:

- `summary.md`
- `metadata.md`
- `paper.pdf`

## Current State

The inbox has already been triaged once.

Canonical mappings are recorded in:

- `inbox/intake-index.md`

Current canonical papers already created:

- `papers/submodular/1970-edmonds-submodular-functions-matroids-polyhedra/`
- `papers/submodular/1978-nemhauser-wolsey-fisher-maximizing-submodular-set-functions/`
- `papers/submodular/1998-bar-ilan-kortsarz-peleg-generalized-submodular-cover/`
- `papers/submodular/2000-schrijver-combinatorial-submodular-minimization/`
- `papers/submodular/2007-iwata-submodular-function-minimization-survey/`
- `papers/submodular/2007-mccormick-on-submodular-function-minimization/`
- `papers/submodular/2013-iyer-bilmes-submodular-cover-knapsack-constraints/`
- `papers/submodular/2014-badanidiyuru-et-al-streaming-submodular-maximization/`
- `papers/submodular/2019-liu-et-al-greedy-strategies-survey/`
- `papers/k-submodular/2014-ward-zivny-maximizing-k-submodular-functions/`
- `papers/dr-submodular/2015-soma-yoshida-generalized-submodular-cover-integer-lattice/`

Existing concept notes already created:

- `concepts/submodularity.md`
- `concepts/polymatroid.md`
- `concepts/greedy-cardinality-approximation.md`
- `concepts/submodular-function-minimization.md`
- `concepts/k-submodularity.md`
- `concepts/dr-submodularity-on-integer-lattice.md`

## Priority Papers

Two papers are currently considered the most foundational originals in the repo:

1. `inbox/AN ANALYSIS OF APPROXIMATIONS FOR.pdf`
2. `inbox/submodular-functions-matroids-and-certain-polyhedra-53q1tge8w1.pdf`

Their canonical folders are:

- `papers/submodular/1978-nemhauser-wolsey-fisher-maximizing-submodular-set-functions/`
- `papers/submodular/1970-edmonds-submodular-functions-matroids-polyhedra/`

If a future thread needs to deepen the knowledge base, these are the best starting point.

## Metadata Conventions

Each `metadata.md` should contain at least:

- `Title`
- `Authors`
- `Year`
- `Venue`
- `Primary group`
- `Secondary tags`
- `Problem`
- `Main guarantee`
- `Key techniques`
- `Status`
- `Tags`
- `Inbox source`

Use `Status` to encode useful state such as:

- `processed-deep`
- `processed-first-pass`
- `survey`
- `foundational-original`
- `venue-year-to-verify`

## Summary Conventions

Each `summary.md` should aim to answer:

1. what problem does the paper study?
2. why does it matter?
3. what are the main results?
4. what is the core algorithmic idea?
5. what is the proof / analysis strategy?
6. what are the key techniques and lemmas?
7. what assumptions, caveats, or failure points matter?
8. what should be extracted to `concepts/`?
9. what should be updated in `syntheses/`?

Default target from now on:

- a deep canonical note
- enough detail that a later thread usually does not need to re-read the whole PDF
- summary should preserve theorem statements at a high level, but also the route to them

For important papers, `summary.md` should include at least:

- problem setup
- main theorem(s)
- algorithm outline
- proof skeleton
- delicate steps / caveats
- reusable ingredients

## When Doing Deeper Work

If the user asks for deeper reading of a paper:

1. read the existing `metadata.md`
2. read the existing `summary.md`
3. reopen the paper PDF
4. upgrade the note instead of creating parallel notes
5. enrich existing concept notes if possible
6. only create a new concept file when the idea is clearly distinct

If a paper is only `processed-first-pass`, prioritize upgrading it to `processed-deep`.

## Duplicate Handling

Known duplicates already identified:

- `inbox/sub-modular.pdf`
  duplicate of the Nemhauser-Wolsey-Fisher paper
- `inbox/submodular.pdf`
  duplicate of `inbox/submodular survey.pdf`

Always record future duplicates in `inbox/intake-index.md`.

## Script Usage

There is a helper script:

- `scripts/new-paper.ps1`

Use it when convenient, but note:

- direct script execution may be blocked by PowerShell execution policy
- if needed, call it with:
  `powershell -ExecutionPolicy Bypass -File .\scripts\new-paper.ps1 -Group <group> -Slug <slug>`

## Style Guidance For Future Codex Threads

- Keep all text files ASCII if possible.
- Prefer updating existing notes over creating overlapping files.
- Be explicit when year / venue is uncertain.
- If a paper is a survey, say so clearly.
- If a file is a scan, reprint, or duplicate, say so clearly.
- Keep the repo useful for retrieval, not just for archiving.
- Record proof ideas in plain language, not only theorem labels.
- Record algorithm intuition, not only pseudocode-level steps.
- Record "what to be careful about" so future threads avoid rereading for the same subtlety.

## Good Next Steps

If a future thread needs a high-value continuation, good options are:

1. deepen the two foundational originals
2. create comparison notes for:
   - cover lineage
   - greedy lineage
   - minimization lineage
3. verify uncertain venue/year metadata
4. process new PDFs added to `inbox/`

After any of those changes, check whether `index.html` also needs an update.