GML-2088: Add regression testing framework with evaluation metrics#43
GML-2088: Add regression testing framework with evaluation metrics#43prinskumar-tigergraph wants to merge 1 commit into
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
| sys.stdout.flush() | ||
| sys.stderr.flush() | ||
| os._exit(1 if any(r.error for r in results) else 0) |
There was a problem hiding this comment.
Suggestion: Make the evaluator fail when a question is unanswered or any metric fails, not only when the HTTP query itself errors. Otherwise a regression run can exit successfully even though hallucination/correctness checks were skipped or failed for every row. [possible issue, importance: 7]
| sys.stdout.flush() | |
| sys.stderr.flush() | |
| os._exit(1 if any(r.error for r in results) else 0) | |
| sys.stdout.flush() | |
| sys.stderr.flush() | |
| exit_code = 1 if any( | |
| r.error or not r.answered_question or r.metric_errors | |
| for r in results | |
| ) else 0 | |
| os._exit(exit_code) |
| ingest_info = client.post(f"/ui/{graphname}/create_ingest", json_body={ | ||
| "data_source": "server", | ||
| "data_source_config": {"data_path": folder_path}, | ||
| "loader_config": {}, | ||
| "file_format": "json", | ||
| }) |
There was a problem hiding this comment.
Suggestion: Do not hard-code file_format to json when data/ may contain other document types. Derive and validate the extension before creating the ingest job so non-JSON datasets do not get ingested with the wrong loader. [possible issue, importance: 7]
| ingest_info = client.post(f"/ui/{graphname}/create_ingest", json_body={ | |
| "data_source": "server", | |
| "data_source_config": {"data_path": folder_path}, | |
| "loader_config": {}, | |
| "file_format": "json", | |
| }) | |
| extensions = { | |
| os.path.splitext(fp)[1].lstrip(".").lower() | |
| for fp in doc_files | |
| if os.path.splitext(fp)[1] | |
| } | |
| if len(extensions) != 1: | |
| raise SetupError( | |
| f"data/ must contain one file type for ingest; found: {sorted(extensions)}" | |
| ) | |
| file_format = extensions.pop() | |
| ingest_info = client.post(f"/ui/{graphname}/create_ingest", json_body={ | |
| "data_source": "server", | |
| "data_source_config": {"data_path": folder_path}, | |
| "loader_config": {}, | |
| "file_format": file_format, | |
| }) |
| LOAD_OUTPUT=$(docker exec -e PYTHONUNBUFFERED=1 "${CONTAINER}" \ | ||
| python /code/tests/regression/setup_graph.py --load-exported "${SETUP_ARGS[@]}" | tee /dev/tty) |
There was a problem hiding this comment.
Suggestion: Avoid writing to /dev/tty while capturing LOAD_OUTPUT; many CI and non-interactive shells do not have a controlling TTY, causing tee to fail under set -euo pipefail. Tee to stderr instead so output is still visible while stdout remains capturable. [possible issue, importance: 8]
| LOAD_OUTPUT=$(docker exec -e PYTHONUNBUFFERED=1 "${CONTAINER}" \ | |
| python /code/tests/regression/setup_graph.py --load-exported "${SETUP_ARGS[@]}" | tee /dev/tty) | |
| LOAD_OUTPUT=$(docker exec -e PYTHONUNBUFFERED=1 "${CONTAINER}" \ | |
| python /code/tests/regression/setup_graph.py --load-exported "${SETUP_ARGS[@]}" | tee /dev/stderr) |
PR Type
Tests, Enhancement
Description
Add GraphRAG regression evaluation framework
Score hallucination confidence with reasons
Automate graph setup and evaluation runs
Mount regression datasets into container
Diagram Walkthrough
File Walkthrough
1 files
Return hallucination confidence with reasoning5 files
Add GraphRAG regression evaluation CLIAdd regression graph setup automationAdd containerized evaluation runner scriptAdd exported graph evaluation runnerAdd containerized graph setup runner1 files
Bump pyTigerGraph minimum dependency version1 files
Mount regression test directories into container