I am currently on the job market and actively searching for new opportunities! If i may be of help, please reach out at rajatmodi62@gmail.com. Many thanks for visiting.
Mount Everest

Representation Supersedeth Scaling.

It does not mean that we should not scale AI. It just means that we should scale it intelligently. Bruteforce scaling wastes compute. A good representation can help scaling models find optimal solution faster. Hence, representations are still important.

[ + ] Publications

OPEN
2020 - 2030

Sky2Ground

Introduced the new problem of cross-view camera localization given aerial, ground, satellite of outdoor scenes. New dataset for 54 scenes. Proposed SkyNet, an architecture for 3D point cloud generation, thus setting a strong baseline for the field. Grateful for the support from IARPA WRIVA grant.

STATUS: CVPR 2026
[PAPER]

Layer Query Networks

Introduces constant-time feature extraction across any layer in a DNN. Operates in O(1) time compared to sequential O(depth). 15% speedup on ImageNet with 12% relative accuracy improvement. Grateful for the support from IARPA WRIVA grant.

STATUS: Under Review
[PAPER]

Foundational Models for Video Understanding

Comprehensive survey of over 200 video foundational models and evaluation metrics across 14 video tasks.

STATUS: Under Review
[PAPER] [GITHUB]

Asynchronous Perception Machine(s)

First working implementation of the GLOM architecture. 10x faster than ViT-B/16 and performs 2% better than state-of-the-art OpenCLIP ViT-H on ImageNet. Spiritual successor to capsule networks. Grateful for the support from IARPA WRIVA grant.

VENUE: NeurIPS 2024
[PAPER] [POSTER] [CODE] [PATENT]

On Occlusions in Video Action Detection

First benchmark studying occlusions in spatio-temporal video action detection. Introduced 5 new datasets and surpassed VideoCapsuleNet by 32.3%.

VENUE: NeurIPS 2023
[PAPER] [POSTER] [CODE]

Video Action Detection: Analyzing Limitations

Dataset containing multiple people performing temporally challenging actions. Featured in the official CVPR workshop.

VENUE: CVPRW 2022
[PAPER] [DATASET] [CHALLENGE]
2010 - 2020

Steganography Using Wavelets

Algorithm forcing stego images to lie within higher frequency ranges of input images with statistical analysis.

VENUE: IEEE Iementech 2017
[PAPER]
PATENTS

Asychronous Perception Machine

First implementation getting GLOM to work. Recently filed as an A1 patent application in USPTO

Externally Guided Multi Domain Personalization

Invented attribute selection and sampling vectors in GANs to achieve personalized user recommendations.

Context Resolution in Autonomous Systems

System using cross-stitch units to merge user inputs into a unified representation.

POST_PREVIEW // 0x5012