Skip to content

VulnerabilityHistoryProject/recidivism

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

877 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

recidivism

Utilities for downloading OSV data, enriching vulnerabilities with a recidivism metric, and cloning referenced source repositories locally.

Configuration

Copy the default config and edit your local paths:

cp recidivism.default.ini recidivism.ini

Both scripts read settings from recidivism.ini. If that file is missing, the scripts print guidance and fall back to recidivism.default.ini.

Scripts

1) Download + enrich OSV vulnerabilities

python scripts/enrich_osv_recidivism.py \
  --output data/osv_recidivism.jsonl

This script:

  • downloads the OSV dump (OSV-all.zip by default),
  • extracts all vulnerabilities,
  • computes a recidivism metric using CWE recurrence and repository/fix history,
  • appends recidivism details to each vulnerability and writes JSONL output.

2) Clone OSV referenced repositories

python scripts/clone_osv_repositories.py \
  --osv-dir data/osv_dump \
  --target-dir data/repos \
  --update-existing

This script scans OSV vulnerabilities for GitHub source references and clones/updates local copies for research workflows (organized as <target-dir>/<owner>/<repo>).

3) Delete skipped/empty repositories

python scripts/cleanup_empty_repos.py --path data/repos --dry-run

The script cleanup_empty_repos.py deletes empty repositories that were created in the cloning process. These repos either no longer exist or were privated. This command runs a dry-run without permanent changes.

python scripts/cleanup_empty_repos.py --path data/repos --yes

This command runs the script and removes empty directories without user prompts.

4) Generate individual recidivism score files

python scripts/generate_recidivism_scores.py \
  --input data/osv_recidivism.jsonl \
  --output-dir data/scores

This script:

  • scans osv_recidivism.jsonl for all vulnerabilities,
  • calculates a recidivism score for each vulnerability based on CWE/repository recurrence,
  • generates individual JSON files in data/scores/<vulnerability_id>.json containing:
    • list of CWEs and repositories referenced in the vulnerability
    • CWE and repository repeat counts
    • raw recidivism score
    • base and adjusted severity scores

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages