Beyond Closed-Set Assumptions: Benchmarking and Adapting Few-Shot Action Recognition for Open-Set Scenarios

Qualitative Examples

Choose a dataset between SSv2 and NTURGBD and check some qualitative comparison between the baseline Softmax and its improvement FR-Disc for open-set tasks.

The accept score is a percentage value on the confidence of the prediction, that is \( \hat{u}_i \) in the figure below.

The baseline tends to assign higher confidence scores to unknown queries, while FR-Disc produces lower scores, indicating better discrimination.

Select Dataset

Select Example

Support Set

Action 1

Action 2

Action 3

Action 4

Action 5

Query

True Class: Loading...

Baseline:

Candidate Class: Loading...

Accept Score: Loading...%

FR-Disc:

Candidate Class: Loading...

Accept Score: Loading...%

Abstract

Few-Shot Action Recognition (FS-AR) has shown promising results but is often limited by a closed-set assumption that fails in real-world open-set scenarios.

While Few-Shot Open-Set (FSOS) recognition is well-established for images, its extension to spatio-temporal video data remains underexplored.

To address this, we propose an architectural extension based on a Feature-Residual Discriminator (FR-Disc), adapting previous work on skeletal data to the more complex video domain.

Extensive experiments on five datasets demonstrate that while common open-set techniques provide only marginal gains, our FR-Disc significantly enhances unknown rejection capabilities without compromising closed-set accuracy, setting a new state-of-the-art for FSOS-AR.

Video

Methods

Overview of the considered open-set techniques adopted in our analysis.

We consider Maximum-Logit-Score and Entropy-Open-Set as Implicit open-set techniques, and Garbage-Class and FR-Disc as Explicit open-set techniques.

As a Model, we considered STRM and SAFSAR.

Correlation Analysis

Previous works claim that in the image domain exists a correlation between closed-set and open-set performances.

We performed an analysis similar to previous works, but on the action recognition domain.

Our findings demonstrates that for the two considered model, there is a linear correlation between closed and open-set performances, as also show in the plot on the right.

Quantitative Results

The FR-Disc row is highlighted, and cell colors show performance relative to the Softmax baseline.

SAFSAR Results
STRM Results

Combined results for SAFSAR in 5-way 1-shot and 5-shot settings.

Dataset	OS-Method	FS ACC		OS ACC		AUROC		AUPR		OSCR
Dataset	OS-Method	1-shot	5-shot	1-shot	5-shot	1-shot	5-shot	1-shot	5-shot	1-shot	5-shot
Diving48	Softmax	63.49	74.12	64.16	65.36	68.48	71.49	69.71	72.07	59.86	66.15
	EOS	64.43	72.80	62.91	64.94	68.64	74.60	69.71	75.06	60.36	67.43
	GC	65.01	76.32	64.89	69.36	70.18	75.82	71.38	76.01	60.63	69.13
	FR-Disc	68.83	78.58	66.22	70.29	71.28	76.55	72.28	76.32	63.46	71.04
SSv2	Softmax	62.11	74.08	62.29	69.61	70.39	77.05	73.02	78.80	60.72	69.35
	EOS	62.97	73.84	65.08	69.90	71.56	79.60	73.74	81.18	61.53	70.38
	GC	62.47	76.24	64.06	70.89	69.20	77.62	70.47	76.92	59.26	70.20
	FR-Disc	63.37	77.88	66.56	73.52	72.18	81.56	74.98	82.96	62.12	73.18
NTURGBD	Softmax	88.31	91.58	79.90	81.45	87.76	91.44	88.35	91.30	81.45	84.83
	EOS	87.63	91.86	80.10	82.18	88.17	91.89	88.96	92.19	81.34	85.11
	GC	89.07	92.40	81.78	81.51	89.30	89.37	88.94	88.63	81.93	84.03
	FR-Disc	89.97	95.54	82.95	86.53	89.78	94.31	89.82	94.26	83.12	88.31
HMDB51	Softmax*	65.29	79.68	64.44	72.40	70.76	81.79	73.87	83.33	62.23	74.26
	EOS*	69.19	80.18	66.23	72.48	74.60	81.45	77.13	83.50	65.74	74.61
	GC	62.85	76.74	60.19	64.30	68.99	76.45	71.87	78.93	60.49	70.42
	FR-Disc	72.38	85.17	68.87	76.99	77.48	87.94	80.30	89.75	68.79	80.15
UCF101	Softmax*	95.04	98.32	80.59	91.25	94.55	98.03	95.04	98.32	88.31	91.57
	EOS*	94.84	98.78	81.30	89.32	95.18	98.31	95.73	98.50	88.47	91.97
	GC	79.98	86.33	59.88	57.49	75.15	81.91	77.29	82.99	70.85	77.43
	FR-Disc	95.72	99.28	86.82	91.52	95.19	98.89	96.23	99.08	89.01	92.49

* Methods marked with an asterisk were trained for 1K iterations to prevent overfitting on that dataset.

Results for STRM in 5-way 5-shot settings.

Dataset	OS-Method	FS ACC	OS ACC	AUROC	AUPR	OSCR
Diving48	Softmax	74.74	52.91	58.92	63.25	63.61
	EOS	73.58	60.51	67.01	69.02	65.63
	GC	73.22	53.76	71.71	72.56	66.09
	FR-Disc	77.74	64.92	75.73	75.72	70.00
SSv2	Softmax	65.25	53.98	61.94	66.33	60.77
	EOS	63.61	59.91	64.29	69.09	60.65
	GC	43.05	50.68	58.15	62.45	47.06
	FR-Disc	65.51	58.99	70.82	73.59	62.52
NTURGBD	Softmax	95.14	50.85	74.69	78.26	80.28
	EOS	92.08	75.80	85.74	86.32	83.30
	GC	85.71	59.29	80.27	80.02	76.84
	FR-Disc	93.28	80.76	92.27	91.95	86.11
HMDB51	Softmax	75.92	52.18	70.49	65.29	67.34
	EOS	75.18	64.99	75.18	70.58	69.33
	GC	71.14	51.10	78.85	73.85	65.96
	FR-Disc	75.50	56.87	77.59	75.20	69.11
UCF101	Softmax	95.48	51.20	86.79	89.39	87.40
	EOS	95.05	75.71	90.35	92.42	86.87
	GC	95.58	63.21	93.35	95.34	83.71
	FR-Disc	95.76	63.47	93.60	94.56	88.37

A Baseline Study and Benchmark for Few-Shot Open-Set Action Recognition with Feature Residual Discrimination