Windows本地安装LLaMA-Factory
<p>以下是LLaMA-Factory官方推荐的依赖组件及其版本,如果在linux上安装建议使用表格中的推荐版本,但是在windows上安装时,由于各组件提供的windows版本没有linux版本完备,为了兼容性考虑可节省时间(使用发布的wheel包而不是本地编译),这里并没有完全采用官方推荐的版本。</p><p><img alt="image" width="335" height="559" loading="lazy" src="https://img2024.cnblogs.com/blog/109287/202509/109287-20250902103927045-1671073221.png" class="lazyload"></p>
<p>以下为window本地安装LLaMA-Factory的详细步骤</p>
<h3>1、更新显卡驱动(推荐使用nvidia显卡)</h3>
<ol start="1">
<li>
<p class="ds-markdown-paragraph">访问 NVIDIA 驱动程序下载。</p>
</li>
<li>
<p class="ds-markdown-paragraph">选择你的显卡型号,下载最新的 Game Ready Driver 或 Studio Driver。</p>
</li>
<li>
<p class="ds-markdown-paragraph">运行安装程序,选择“自定义安装”和“执行清洁安装”,完成后重启电脑。</p>
</li>
</ol>
<p>在windows上安装LLaMA-Factory,需要安装windows版本的PyTorch 、bitsandbytes 和FlashAttention</p>
<h3>2、 安装 CUDA Toolkit</h3>
<ol start="1">
<li>
<p class="ds-markdown-paragraph">根据准备使用的PyTorch 、bitsandbytes 和FlashAttention的版本来决定CUDA的版本,不同版本可能存在不兼容的情况,例如不同版本的bitsandbytes 需要指定版本的PyTorch和CUDA toolkit,并且不同版本的PyTorch对CUDA toolkit的版本也有要求,因此不要盲目安装最新版 CUDA。(本文中使用CUDA12.1,https://developer.nvidia.com/cuda-12-1-0-download-archive?target_os=Windows&target_arch=x86_64&target_version=11&target_type=exe_local)</p>
</li>
<li>
<p class="ds-markdown-paragraph">访问 CUDA Toolkit 下载,选择与 PyTorch 匹配的版本(例如 12.1)、系统(Windows)、架构(x86_64)和安装类型(<code>exe </code>)。(https://developer.nvidia.com/cuda-toolkit-archive:下载历史CUDA版本)</p>
</li>
<li>
<p class="ds-markdown-paragraph">运行安装程序,选择“自定义”安装,组件保持默认全选即可</p>
</li>
</ol>
<p><img alt="image" loading="lazy" src="https://img2024.cnblogs.com/blog/109287/202509/109287-20250902232451966-1278833279.png" class="lazyload"></p>
<h3>3、安装Conda </h3>
<p>LLaMA-Factory的安装需要安装大量的python包和其他组件,使用Conda可以有效避免python版本冲突带来的问题</p>
<ol>
<li>下载Conda,Distribution Installers,Miniconda Installers均可(Download Success | Anaconda)</li>
<li>初始化环境变量
<div class="cnblogs_code">
<pre>conda init</pre>
</div>
</li>
<li>创建conda虚拟环境,python使用3.10版本
<div class="cnblogs_code">
<pre># 创建 Python 3.10<span> 环境
conda create -n llama-factory python=3.10<span>
# 激活环境
conda activate llama-factory</span></span></pre>
</div>
</li>
</ol>
<h3>4、安装Visual Studio Build Tools</h3>
<p>如果安装了visual studio,则不需要再单独安装</p>
<h3><span style="font-size: 1.17em">5、安装PyTorch</span></h3>
<p>查看 PyTorch 支持的版本:访问 PyTorch 官网。安装与CUDA版本兼容的PyTorch版本(PyTorch Version: 2.5.1+cu121)</p>
<div class="cnblogs_code">
<pre>pip <span style="color: rgba(0, 0, 255, 1)">install</span> torch torchvision torchaudio --index-url https:<span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)">download.pytorch.org/whl/cu121</span></pre>
</div>
<p><img alt="image" loading="lazy" src="https://img2024.cnblogs.com/blog/109287/202509/109287-20250902094521651-621908631.png" class="lazyload"></p>
<p>使用以下脚本验证PyTorch是否成功安装</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> torch
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 1. 打印PyTorch版本</span>
<span style="color: rgba(0, 0, 255, 1)">print</span>(f<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">PyTorch Version: {torch.__version__}</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 2. 打印PyTorch构建所用的CUDA版本(这里显示12.1是正常的)</span>
<span style="color: rgba(0, 0, 255, 1)">print</span>(f<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">PyTorch CUDA Version: {torch.version.cuda}</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 3. 最关键的一步:检查CUDA是否可用</span>
<span style="color: rgba(0, 0, 255, 1)">print</span>(f<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">CUDA Available: {torch.cuda.is_available()}</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 4. 如果可用,打印GPU信息</span>
<span style="color: rgba(0, 0, 255, 1)">if</span><span style="color: rgba(0, 0, 0, 1)"> torch.cuda.is_available():
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(f<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">Number of GPUs: {torch.cuda.device_count()}</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(f<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">Current GPU Name: {torch.cuda.get_device_name(0)}</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(f<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">Current GPU Index: {torch.cuda.current_device()}</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 5. 做一个简单的张量运算来测试功能</span>
x = torch.tensor().cuda()
y </span>= torch.tensor().cuda()
z </span>= x +<span style="color: rgba(0, 0, 0, 1)"> y
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(f<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">Tensor computation on GPU: {z}</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(f<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">Tensor device: {z.device}</span><span style="color: rgba(128, 0, 0, 1)">"</span>)</pre>
</div>
<h3>6、安装bitsandbytes <span style="color: rgba(255, 0, 0, 1)">(如果不需要启用量化LoRA,可跳过此步)</span></h3>
<p>访问Release Wheels · jllllll/bitsandbytes-windows-webui · GitHub查看release的wheel文件,根据安装的CUDA toolkit版本(12.1)和PyTorch版本(2.5.1+cu121)选择与之兼容的bitsandbytes版本,下载wheel文件并安装</p>
<div class="cnblogs_code">
<pre>pip <span style="color: rgba(0, 0, 255, 1)">install</span> bitsandbytes-<span style="color: rgba(128, 0, 128, 1)">0.41</span>.<span style="color: rgba(128, 0, 128, 1)">1</span>-py3-none-win_amd64.whl</pre>
</div>
<p>使用以下脚本验证bitsandbytes是否成功安装</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> bitsandbytes as bnb
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 这个操作会触发bitsandbytes加载CUDA库,并显示其编译/链接的CUDA版本。</span><span style="color: rgba(0, 128, 0, 1)">
#</span><span style="color: rgba(0, 128, 0, 1)"> 通常如果成功导入且无报错,就说明它找到了匹配的CUDA环境。</span>
<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 更直接的方法:创建一个量化层,看是否报错</span>
<span style="color: rgba(0, 0, 255, 1)">try</span><span style="color: rgba(0, 0, 0, 1)">:
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 尝试创建一个4bit量化层,这会用到CUDA kernel</span>
linear = bnb.nn.Linear4bit(10, 20<span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">✅ bitsandbytes 安装成功,并且CUDA运行正常!</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(f<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)"> 它正在使用与PyTorch相同的CUDA环境。</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">except</span><span style="color: rgba(0, 0, 0, 1)"> Exception as e:
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(f<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">❌ 错误: {e}</span><span style="color: rgba(128, 0, 0, 1)">"</span>)</pre>
</div>
<h3>7、安装flash-attention(lldacing/flash-attention-windows-wheel · Hugging Face)<span style="color: rgba(255, 0, 0, 1)">(如果不需要启用 FlashAttention-2,可跳过此步)</span></h3>
<p>首先查看 Releases · kingbri1/flash-attention 上有没有编辑好的兼容本地CUDA toolkit版本(12.1)和PyTorch版本(2.5.1+cu121)的wheel包,有的话直接下载安装即可,没有的话则需要按照以下步骤在本地build wheel包:</p>
<ol>
<li>clone <span class="AppHeader-context-item-label " data-target="context-region-crumb.labelElement">flash-attention</span> 的源码到本地,Dao-AILab/flash-attention: Fast and memory-efficient exact attention</li>
<li>根据实际情况(例如CUDA toolkit版本和PyTorch版本)选择使用的代码版本,这里使用了 v2.7.0.post2</li>
<li>使用 lldacing/flash-attention-windows-wheel · Hugging Face 中提供的WindowsWhlBuilder_cuda.bat文件buildwheel包,其中‘<code>CUDA_ARCH</code>’ 参数要根据本地显卡型号做设置,可通过以下命令获取,不同 NVIDIA 显卡对应不同的数值(格式为 <code>主版本.次版本</code>,通常简化为整数,如 8.9 简写为 89)
<div class="cnblogs_code">
<pre>nvidia-smi --query-gpu=name,compute_cap --format=csv</pre>
</div>
</li>
</ol>
<p><img alt="image" loading="lazy" src="https://img2024.cnblogs.com/blog/109287/202509/109287-20250903162559331-557543274.png" class="lazyload"></p>
<p> 4. 在‘Native Tools Command Prompt for Visual Studio’中执行脚本,注意需要激活创建的conda虚拟环境(llama-factory),编译过程中会使用虚拟环境中安装的CUDA、PyTorch和Python版本</p>
<div class="cnblogs_code">
<pre>WindowsWhlBuilder_cuda.bat CUDA_ARCH=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">89</span><span style="color: rgba(128, 0, 0, 1)">"</span> FORCE_CXX11_ABI=TRUE</pre>
</div>
<p>编译过程根据机器性能可能花费几十分钟到几小时不等(本人用了7小时),编译好的wheel包,例如‘flash_attn-2.7.0.post2+cu121torch2.5.1cxx11abiFALSE-cp310-cp310-win_amd64.whl’,代表flash-attention的版本是2.7.0.post2,CUDA的版本是12.1,torch的版本是2.5.1,python的版本是3.10</p>
<p>最后使用编译好的wheel包安装flash-attention</p>
<div class="cnblogs_code">
<pre>pip <span style="color: rgba(0, 0, 255, 1)">install</span> flash_attn-<span style="color: rgba(128, 0, 128, 1)">2.7</span>.<span style="color: rgba(128, 0, 128, 1)">0</span>.post2+cu121torch2.<span style="color: rgba(128, 0, 128, 1)">5</span>.1cxx11abiFALSE-cp310-cp310-win_amd64.whl</pre>
</div>
<p>使用以下脚本验证flash-attention是否成功安装</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> torch
</span><span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> flash_attn
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">=</span><span style="color: rgba(128, 0, 0, 1)">"</span>*50<span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">验证环境配置</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">=</span><span style="color: rgba(128, 0, 0, 1)">"</span>*50<span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(f<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">PyTorch 版本: {torch.__version__}</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(f<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">PyTorch CUDA 版本: {torch.version.cuda}</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(f<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">CUDA 是否可用: {torch.cuda.is_available()}</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(f<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">GPU 设备: {torch.cuda.get_device_name(0)}</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(f<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">\nFlashAttention 版本: {flash_attn.__version__}</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">\n✅ 验证成功!FlashAttention 已安装并可正常导入。</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)"> 它正在使用您PyTorch环境中的CUDA 12.1。</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 可选:进行一个简单的前向计算测试(如果担心运行时错误)</span>
<span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">\n进行简单计算测试...</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">try</span><span style="color: rgba(0, 0, 0, 1)">:
dim </span>= 64<span style="color: rgba(0, 0, 0, 1)">
q </span>= torch.randn(1, 8, 128, dim, device=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">cuda</span><span style="color: rgba(128, 0, 0, 1)">'</span>, dtype=<span style="color: rgba(0, 0, 0, 1)">torch.float16)
k </span>= torch.randn(1, 8, 128, dim, device=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">cuda</span><span style="color: rgba(128, 0, 0, 1)">'</span>, dtype=<span style="color: rgba(0, 0, 0, 1)">torch.float16)
v </span>= torch.randn(1, 8, 128, dim, device=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">cuda</span><span style="color: rgba(128, 0, 0, 1)">'</span>, dtype=<span style="color: rgba(0, 0, 0, 1)">torch.float16)
output </span>= flash_attn.flash_attn_func(q, k, v, causal=<span style="color: rgba(0, 0, 0, 1)">True)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">✅ 计算测试通过!FlashAttention CUDA kernel 工作正常。</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">except</span><span style="color: rgba(0, 0, 0, 1)"> Exception as e:
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(f<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">❌ 计算测试失败: {e}</span><span style="color: rgba(128, 0, 0, 1)">"</span>)</pre>
</div>
<h3> 7、安装LLaMA-Factory</h3>
<p>Clone LLama-Factory源码(hiyouga/LLaMA-Factory: Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)),根据提供的文档安装即可(安装 - LLaMA Factory),核心安装命令</p>
<div class="cnblogs_code">
<pre>pip <span style="color: rgba(0, 0, 255, 1)">install</span> -e <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">.</span><span style="color: rgba(128, 0, 0, 1)">"</span></pre>
</div>
<p>启动webui</p>
<div class="cnblogs_code">
<pre>llamafactory-cli webui</pre>
</div>
<p>访问webui:http://localhost:7860/,大功告成!!!!!!!!</p>
<p><img alt="image" loading="lazy" src="https://img2024.cnblogs.com/blog/109287/202509/109287-20250904002552017-103119264.png" class="lazyload"></p>
<p> </p><br><br>
来源:https://www.cnblogs.com/Tiger-Lu/p/19069444
頁:
[1]