Posts
Papers
CME295 Lectures
CS229 Lectures
CS230 Lectures
Dictionary
Geoffrey Hinton
Python
Yann LeCun
Papers
Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos
Video2GUI - Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining
Self-Manager - Parallel Agent Loop for Long-form Deep Research
The Era of Agentic Organization - Learning to Organize with Language Models
ProgramBench - Can Language Models Rebuild Programs From Scratch
World Action Models - The Next Frontier in Embodied AI
Model-Agnostic Lifelong LLM Safety via Externalized Attack-Defense Co-Evolution
FlowCompile - An Optimizing Compiler for Structured LLM Workflows
LongMemEval-V2 - Evaluating Long-Term Agent Memory Toward Experienced Colleagues
OmniNFT - Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation
Nested Learning - The Illusion of Deep Learning Architectures
SenseNova-U1 - Unifying Multimodal Understanding and Generation with NEO-unify Architecture
Predictive Maps of Multi-Agent Reasoning - A Successor-Representation Spectrum for LLM Communication Topologies
Soohak - A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs
Subliminal Learning - Language Models Transmit Behavioral Traits via Hidden Signals in Data
HeavySkill - Heavy Thinking as the Inner Skill in Agentic Harness
KisMATH - Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning
NVIDIA Nemotron-Personas-Korea
Recursive Multi-Agent Systems
World-R1 - Reinforcing 3D Constraints for Text-to-Video Generation
Dive into Claude Code The Design Space of AI Agent Systems
Think in Strokes, Not Pixels - Process-Driven Image Generation via Interleaved Reasoning
Representation Alignment for Just Image Transformers is not Easier than You Think
MA-EgoQA - Question Answering over Egocentric Videos from Multiple Embodied Agents
OpenClaw-RL - Train Any Agent Simply by Talking
Thinking to Recall - How Reasoning Unlocks Parametric Knowledge in LLMs
OmniLottie - Generating Vector Animations via Parameterized Lottie Tokens
SkillOrchestra - Learning to Route Agents via Skill Transfer
Kimi k2.5 - 200만 토큰의 멀티모달 에이전트
Deep Delta Learning
Seedream 4.0 Toward Next-generation Multimodal Image Generation
Souper-Model How Simple Arithmetic Unlocks State-of-the-Art LLM Performance
Black-Box On-Policy Distillation of Large Language Models
Depth Anything 3 Recovering the Visual Space from Any Views
LeJEPA Provable and Scalable Self-Supervised Learning Without the Heuristics
Kosmos An AI Scientist for Autonomous Discovery
Context Engineering 2.0 - The Context of Context Engineering
Emu3.5 Native Multimodal Models are World Learners
Kimi Linear An Expressive, Efficient Attention Architecture
Exploring Conditions for Diffusion Models in Robotic Control
A Survey of Data Agents Emerging Paradigm or Overstated Hype
Real Deep Research for AI, Robotics and Beyond
The Free Transformer
A Definition of AGI
FineVision Open Data Is All You Need
DeepSeek-OCR Contexts Optical Compression
Detect Anything via Next Point Prediction
The Dragon Hatchling The Missing Link between the Transformer and Models of the Brain
MCPMark A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use
VFF-Net Evolving forward–forward algorithms into convolutional neural networks for enhanced computational insights
Diffusion Transformers with Representation Autoencoders
Training-Free Group Relative Policy Optimization
Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought
Meta-Awareness Enhances Reasoning Models Self-Alignment Reinforcement Learning
Agent Learning via Early Experience
FAST-DLLM V2 Efficient Block-Diffusion LLM
Less is More Recursive Reasoning with Tiny Networks
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
CoDA Agentic Systems for Collaborative Data Visualization
Video models are zero-shot learners and reasoners
Soft Tokens, Hard Truths
Sharing is Caring Efficient LM Post-Training with Collective RL Experience Sharing
Why Language Models Hallucinate
Hunyuan3D Studio End-to-End AI Pipeline for Game-Ready 3D Asset Generation
DINOv3
Prefix-Tuning Optimizing Continuous Prompts for Generation
You Only Look Once, Unified Real-Time Object Detection
Attention Is All You Needs
EXAONE 4.0 Unified Large Language Models Integrating Non-reasoning and Reasoning Modes
군중 상황에서 정확한 다중 사람의 자세 인식을 위한 군중 자세 주석 데이터 세트