Low-light image enhancement through deep learning can improve noise reduction and visibility. However, existing methods often lack the ability to perform semantic-level, quantitative brightness adjustments, limiting their capacity for personalized lighting control. To address these limitations, we propose a novel framework that utilizes Large Language Model (LLM) capable of interpreting natural language prompt to identify target objects and specify brightness modifications. The framework then employs a Retinex-based Reasoning Segment (RRS) module to generate accurate target localization masks. Concurrently, a Text-based Brightness Controllable (TBC) module applies precise brightness adjustments based on the natural language input. To ensure seamless integration of these components, we introduce an Adaptive Contextual Compensation (ACC) module, which synthesizes multi-source input conditions, guiding a conditional diffusion model to perform accurate lighting adjustments while maintaining overall image coherence. Experimental results on benchmark datasets demonstrate the system's superior performance in enhancing visibility, maintaining natural color balance, and amplifying fine details without introducing artifacts. Our framework also exhibits strong generalization capabilities, enabling complex, semantic-level and personalized lighting adjustments through natural language interactions across various scenarios.
Results of different tasks, Task A and B aimed at decreasing it and some others aimed at increasing brightness, covering a range of scenes such as stage performances, everyday environments, and medical images. The tasks, labeled A through E, involve modifying the brightness of a masked object or area (main character, lady, blackboard, side representing evil, left lung) by a percentage ranging from 10% to 40%.
The application of natural language processing enables complex lighting adjustments in images. In each task, natural language instructions are used to brighten both target and background lighting adjustments, all driven by linguistic input.
Face detection performance in low-light conditions is shown across different methods. The figure presents visual comparisons of face detection, using various enhancement techniques combined with DSFD. The "Ours + DSFD" delivers the clearest and most accurate results compared to raw input, EnlightenGAN, KinD++, LLflow, and ZeroDCE.