A Baseline Study and Benchmark for Few-Shot Open-Set Action Recognition with Feature Residual Discrimination

Qualitative Examples

Choose a dataset between SSv2 and NTURGBD and check some qualitative comparison between the baseline Softmax and its improvement FR-Disc for open-set tasks.

The accept score is a percentage value on the confidence of the prediction, that is \( \hat{u}_i \) in the figure below.

The baseline tends to assign higher confidence scores to unknown queries, while FR-Disc produces lower scores, indicating better discrimination.

Support Set

Action 1
Action 2
Action 3
Action 4
Action 5
Large GIF Example

Query

Small GIF Example

True Class: Loading...

Baseline:

Candidate Class: Loading...

Accept Score: Loading...%

FR-Disc:

Candidate Class: Loading...

Accept Score: Loading...%

Abstract

Few-Shot Action Recognition (FS-AR) has shown promising results but is often limited by a closed-set assumption that fails in real-world open-set scenarios.

While Few-Shot Open-Set (FSOS) recognition is well-established for images, its extension to spatio-temporal video data remains underexplored.

To address this, we propose an architectural extension based on a Feature-Residual Discriminator (FR-Disc), adapting previous work on skeletal data to the more complex video domain.

Extensive experiments on five datasets demonstrate that while common open-set techniques provide only marginal gains, our FR-Disc significantly enhances unknown rejection capabilities without compromising closed-set accuracy, setting a new state-of-the-art for FSOS-AR.

Methods

Overview of the considered open-set techniques adopted in our analysis.

We consider Maximum-Logit-Score and Entropy-Open-Set as Implicit open-set techniques, and Garbage-Class and FR-Disc as Explicit open-set techniques.

As a Model, we considered STRM and SAFSAR.

Methods Overview

Correlation Analysis

Previous works claim that in the image domain exists a correlation between closed-set and open-set performances.

We performed an analysis similar to previous works, but on the action recognition domain.

Our findings demonstrates that for the two considered model, there is a linear correlation between closed and open-set performances, as also show in the plot on the right.

Correlation Analysis

Quantitative Results

The FR-Disc row is highlighted, and cell colors show performance relative to the Softmax baseline.

Combined results for SAFSAR in 5-way 1-shot and 5-shot settings.

Dataset OS-Method FS ACCOS ACCAUROCAUPROSCR
1-shot5-shot1-shot5-shot1-shot5-shot1-shot5-shot1-shot5-shot
Diving48 Softmax63.4974.1264.1665.3668.4871.4969.7172.0759.8666.15
EOS64.4372.8062.9164.9468.6474.6069.7175.0660.3667.43
GC65.0176.3264.8969.3670.1875.8271.3876.0160.6369.13
FR-Disc68.8378.5866.2270.2971.2876.5572.2876.3263.4671.04
SSv2 Softmax62.1174.0862.2969.6170.3977.0573.0278.8060.7269.35
EOS62.9773.8465.0869.9071.5679.6073.7481.1861.5370.38
GC62.4776.2464.0670.8969.2077.6270.4776.9259.2670.20
FR-Disc63.3777.8866.5673.5272.1881.5674.9882.9662.1273.18
NTURGBD Softmax88.3191.5879.9081.4587.7691.4488.3591.3081.4584.83
EOS87.6391.8680.1082.1888.1791.8988.9692.1981.3485.11
GC89.0792.4081.7881.5189.3089.3788.9488.6381.9384.03
FR-Disc89.9795.5482.9586.5389.7894.3189.8294.2683.1288.31
HMDB51 Softmax*65.2979.6864.4472.4070.7681.7973.8783.3362.2374.26
EOS*69.1980.1866.2372.4874.6081.4577.1383.5065.7474.61
GC62.8576.7460.1964.3068.9976.4571.8778.9360.4970.42
FR-Disc72.3885.1768.8776.9977.4887.9480.3089.7568.7980.15
UCF101 Softmax*95.0498.3280.5991.2594.5598.0395.0498.3288.3191.57
EOS*94.8498.7881.3089.3295.1898.3195.7398.5088.4791.97
GC79.9886.3359.8857.4975.1581.9177.2982.9970.8577.43
FR-Disc95.7299.2886.8291.5295.1998.8996.2399.0889.0192.49

* Methods marked with an asterisk were trained for 1K iterations to prevent overfitting on that dataset.