Skip to content

Add meaningful multivariate subsequence search tutorial#1138

Open
omkar-334 wants to merge 1 commit into
stumpy-dev:mainfrom
omkar-334:topk-search-example
Open

Add meaningful multivariate subsequence search tutorial#1138
omkar-334 wants to merge 1 commit into
stumpy-dev:mainfrom
omkar-334:topk-search-example

Conversation

@omkar-334

Copy link
Copy Markdown

This PR adds a new tutorial inspired by Eamonn Keogh's multivariate top-k subsequence search example. The tutorial uses a self-contained synthetic sensor dataset to show why raw multivariate distances can be misleading, then demonstrates z-normalized stumpy.match and query-time channel selection for meaningful top-k matches.

Validation:

  • Executed the notebook end to end with nbconvert.
  • Rendered the tutorial to HTML.
  • Built the Sphinx docs and confirmed the new tutorial executes and renders.

Fixes #1137

@omkar-334 omkar-334 requested a review from seanlaw as a code owner June 9, 2026 03:41
@gitnotebooks

gitnotebooks Bot commented Jun 9, 2026

Copy link
Copy Markdown

Found 1 changed notebook. Review the changes at https://app.gitnotebooks.com/stumpy-dev/stumpy/pull/1138

@seanlaw

seanlaw commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@omkar-334 Thank you for your PR. Please allow me some time to review and provide comments

@seanlaw seanlaw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@omkar-334 Thank you for taking the time to submit this PR. The goal of the original issue is to exactly reproduce the Whale example from Keogh's tutorial as it is the most clear and illustrative (rather than using a synthetic dataset). If that is not possible (e.g., if the data is not publicly available) then we should reach out to the author for the data or, otherwise, we should close this issue.

"import stumpy\n",
"import matplotlib.pyplot as plt\n",
"\n",
"for style in (\"./stumpy.mplstyle\", \"docs/stumpy.mplstyle\"):\n",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with all other notebooks. Please follow the one-liner from other notebooks

"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating Flight-like Sensor Data\n",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an overwhelming majority of our tutorials, we always aim to reproduce the exact published work from the original authors rather than using a synthetic dataset. The primary goal is to reproduce the published figures exactly and while demonstrating how to achieve this with STUMPY is secondary, maybe tertiary. Based on Keogh's tutorial, I would target the Whale example as it is more realistic and truly multidimensional. If that example is not available then we should conclude that it is not possible to reproduce the work and close this issue.

"shade_known_windows(axs, known_events, m)\n",
"axs[-1].set_xlabel(\"Time\")\n",
"axs[0].set_title(\"Synthetic Multivariate Sensor Data\")\n",
"plt.show()"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you actually executed all cells of this notebook? Without the plots, the tutorial is meaningless and difficult (impossible) to review

"\n",
"Suppose that we are searching through a multivariate time series collected from a flight. The channels have different physical units: altitude, airspeed, outside temperature, and hydraulic pressure. Three short maneuver windows share the same shape in the first three channels. The pressure channel, however, contains a large pressure pulse in the query window and in one unrelated window.\n",
"\n",
"This is intentionally constructed so that raw distance can prefer the pressure-only distractor over the true repeated maneuvers."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (generating synthetic data) is making the tutorial too complicated to comprehend for our average user

"\n",
"axs[-1].set_xlabel(\"Relative time\")\n",
"axs[0].set_title(\"Multivariate Query Window\")\n",
"plt.show()"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please show plot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Meaningful Fast Top-k Subsequence Search for Multivariate Time Series

2 participants