I am currently on the job market and actively searching for new opportunities!If my research can contribute in any way, please reach out at rajatmodi62@gmail.com

2017- 2025
[1] Asynchronous Perception Machine(s): First working implementation of the GLOM architecture. It was previously only a theoretical idea. Our implementation is 10x faster than ViT-B/16 and performs 2% better than state-of-the-art OpenCLIP ViT-H on ImageNet. Designed as a spiritual successor to capsule networks.
Venue: NeurIPS 2024
Links: [Paper] [Poster] [Code] [Patent]

[2] Layer Query Networks: Introduces constant-time feature extraction across any layer in a DNN. While existing networks are sequential ($O(\text{depth})$), LQNs operate in $O(1)$ time. Achieved 15% speedup on ImageNet with 12% relative accuracy improvement.
Status: ICLR 2026 (Under Review)
Links: [Paper]

[3] On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes: First benchmark studying occlusions in spatio-temporal video action detection. Introduced 5 new datasets and surpassed VideoCapsuleNet by 32.3%.
Venue: NeurIPS 2023
Links: [Paper] [Poster] [Code]

[4] Video Action Detection: Analyzing Limitations And Challenges. Dataset containing multiple people performing temporally challenging actions (e.g., opening/closing doors). Featured in the official CVPR workshop.
Venue: CVPRW 2022
Links: [Paper] [Dataset] [Challenge]

[5] Sky2Ground: First dataset consisting of real and synthetic ground/satellite/aerial images for 54 outdoor scenes with camera poses. Proposed SkyNet, a 3D point cloud generation architecture using novel curriculum-inspired strategies. Supported by a $500k grant from IARPA.
Status: CVPR 2026 (Under Review)
Links: [Paper]

[6] Foundational Models for Video Understanding: A survey: First survey of over 200 video foundational models and evaluation metrics across 14 video tasks.
Status: ACM Computing Survey (Under Review)
Links: [Paper] [GitHub]

[7] Steganography Using Wavelets with statistical performance analysis: An algorithm which forces a stego image to lie within “higher frequencies” range of a given input image.
Venue: IEEE Iementech 2017
Links: [Paper]

Patents

[1] Asychronous Perception machine APM is the first implementation to getting GLOM work. Recently filed as an A1 patent application in USPTO by UCF.
Links: [Patent Application]

[1] Externally Guided Multi Domain Personalization: Invented attribute selection and sampling vectors in GANs to help achieve personalized user recommendations.
Links: [US A1 Patent]

[2] Context Resolution in Autonomous Systems Propose a system using cross-stitch units to merge user inputs into a unified representation.
Links: [US A2 Patent]

A lot more still needs to be done.