Papers

Souper-Model How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Souper-Model How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Black-Box On-Policy Distillation of Large Language Models

Black-Box On-Policy Distillation of Large Language Models

Depth Anything 3 Recovering the Visual Space from Any Views

Depth Anything 3 Recovering the Visual Space from Any Views

LeJEPA Provable and Scalable Self-Supervised Learning Without the Heuristics

LeJEPA Provable and Scalable Self-Supervised Learning Without the Heuristics

Scaling Latent Reasoning via Looped Language Models

Scaling Latent Reasoning via Looped Language Models

Kosmos An AI Scientist for Autonomous Discovery

Kosmos An AI Scientist for Autonomous Discovery

Emu3.5 Native Multimodal Models are World Learners

Emu3.5 Native Multimodal Models are World Learners

Context Engineering 2.0 - The Context of Context Engineering

Context Engineering 2.0 - The Context of Context Engineering

Kimi Linear An Expressive, Efficient Attention Architecture

Kimi Linear An Expressive, Efficient Attention Architecture

Exploring Conditions for Diffusion Models in Robotic Control

Exploring Conditions for Diffusion Models in Robotic Control

A Survey of Data Agents Emerging Paradigm or Overstated Hype

A Survey of Data Agents Emerging Paradigm or Overstated Hype

Real Deep Research for AI, Robotics and Beyond

Real Deep Research for AI, Robotics and Beyond

The Free Transformer

The Free Transformer

A Definition of AGI

A Definition of AGI

FineVision Open Data Is All You Need

FineVision Open Data Is All You Need

DeepSeek-OCR Contexts Optical Compression

DeepSeek-OCR Contexts Optical Compression

Detect Anything via Next Point Prediction

Detect Anything via Next Point Prediction

MCPMark A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use

MCPMark A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use

 The Dragon Hatchling The Missing Link between the Transformer and Models of the Brain

The Dragon Hatchling The Missing Link between the Transformer and Models of the Brain

VFF-Net Evolving forward–forward algorithms into convolutional neural networks for enhanced computational insights

VFF-Net Evolving forward–forward algorithms into convolutional neural networks for enhanced computational insights

Diffusion Transformers with Representation Autoencoders

Diffusion Transformers with Representation Autoencoders

Training-Free Group Relative Policy Optimization

Training-Free Group Relative Policy Optimization

Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought

Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought

Meta-Awareness Enhances Reasoning Models Self-Alignment Reinforcement Learning

Meta-Awareness Enhances Reasoning Models Self-Alignment Reinforcement Learning

Agent Learning via Early Experience

Agent Learning via Early Experience

FAST-DLLM V2 Efficient Block-Diffusion LLM

FAST-DLLM V2 Efficient Block-Diffusion LLM

Less is More Recursive Reasoning with Tiny Networks

Less is More Recursive Reasoning with Tiny Networks

Efficient Multi-modal Large Language Models via Progressive Consistency Distillation

Efficient Multi-modal Large Language Models via Progressive Consistency Distillation

CoDA Agentic Systems for Collaborative Data Visualization

CoDA Agentic Systems for Collaborative Data Visualization

Video models are zero-shot learners and reasoners

Video models are zero-shot learners and reasoners

Soft Tokens, Hard Truths

Soft Tokens, Hard Truths

Sharing is Caring Efficient LM Post-Training with Collective RL Experience Sharing

Sharing is Caring Efficient LM Post-Training with Collective RL Experience Sharing

Why Language Models Hallucinate

Why Language Models Hallucinate

Hunyuan3D Studio End-to-End AI Pipeline for Game-Ready 3D Asset Generation

Hunyuan3D Studio End-to-End AI Pipeline for Game-Ready 3D Asset Generation

DINOv3

DINOv3

Prefix-Tuning Optimizing Continuous Prompts for Generation

Prefix-Tuning Optimizing Continuous Prompts for Generation

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks

You Only Look Once, Unified Real-Time Object Detection

You Only Look Once, Unified Real-Time Object Detection

Attention Is All You Needs

Attention Is All You Needs

EXAONE 4.0 Unified Large Language Models Integrating Non-reasoning and Reasoning Modes

EXAONE 4.0 Unified Large Language Models Integrating Non-reasoning and Reasoning Modes

군중 상황에서 정확한 다중 사람의 자세 인식을 위한 군중 자세 주석 데이터 세트

군중 상황에서 정확한 다중 사람의 자세 인식을 위한 군중 자세 주석 데이터 세트