Capability

Reward Model Training From Pairwise Human Preference Comparisons

16 artifacts provide this capability.

Want a personalized recommendation?

Top Matches

via “reward model training for reinforcement learning from human feedback (rlhf)”

Shanghai AI Lab's multilingual foundation model.

Unique: InternLM provides pre-trained reward models that can be fine-tuned on domain-specific preferences, reducing training time compared to training from scratch; integrates with XTuner for efficient fine-tuning

vs others: More accessible than building custom reward models from scratch; comparable to OpenAI's reward modeling approach but with full transparency and ability to customize for specific domains

Reward Model Training From Pairwise Human Preference Comparisons

Top Matches

Also Known As

Company