【Java深度学习】PyTorch On Java 系列课程第十六章 32 ：PyTorch Java生态扩展llama.cpp TensorRT-LLM[PyTorch Java 硕士研一课程]

Veggie张海寧

235人浏览 · 2026-03-30 13:00:24

Veggie张海寧 · 2026-03-30 13:00:24 发布

在这里插入图片描述
做过自动驾驶的都知道Open3D ，但是它现在只支持Python 和cpp ，虽然它后端是cpp写的，我之前曾给他们github 官方仓库提issue 希望他们支持java，但是他们很决绝的答复我：【java 从来不在我们的考虑范围之内！我们永远也不会开发java 的sdk ！凭什么让我们来支持Java】这大概就是java 的尴尬地位，但是我们有javacpp 这一杀手【贱】的存在，你不支持没关系，只要你的代码是开源且是cpp 写的，我就能让你被迫支持上java。这不我们最近的实验就是使用javacpp 利用Claude 大模型，把Open3D 的底裤给扒下来了，从此我们在jvm 平台也可以用上 Open3D了，当然我们也很贱，又跑到Open3d 原issue 回复官方，我们已经让 Open3D的最新版本支持上了 Java！
未来希望更多的朋友能够使用javacpp 和claude 来把一些cpp 重要的工具转译到java 生态中，这种造轮子是值得提倡去做了，极大的丰富和对齐了java在一些高性能工具上的不足。
我们给大家看几个使用示例
1

Markdown
复制

package org.open3d.examples;

import org.open3d.*;
import static org.open3d.global.Open3D.*;

/**
 * Example 1: Read and Write Point Cloud
 *
 * Demonstrates loading a point cloud from a file (PLY/PCD/XYZ),
 * inspecting basic properties, and writing it back to a different format.
 */
public class Example1_ReadWritePointCloud {

public static void main(String[] args) {
if (args.length < 1) {
            System.out.println("Usage: Example1_ReadWritePointCloud <input_file> [output_file]");
System.out.println("  Supported formats: .ply, .pcd, .xyz, .xyzn, .xyzrgb, .pts");
System.out.println("\nExample:");
System.out.println("  java Example1_ReadWritePointCloud bunny.ply output.pcd");
return;
}

        String inputFile = args[0];
String outputFile = args.length > 1 ? args[1] : "output.ply";

System.out.println("=== Open3D JavaCPP - Read/Write Point Cloud ===");
System.out.println("Input:  " + inputFile);
System.out.println("Output: " + outputFile);

// Read point cloud from file
PointCloud pcd = new PointCloud();
boolean readSuccess = ReadPointCloud(inputFile, pcd);

if (!readSuccess) {
            System.err.println("Failed to read point cloud from: " + inputFile);
return;
}

// Print basic info
System.out.println("\n--- Point Cloud Info ---");
System.out.println("Has points:  " + pcd.HasPoints());
System.out.println("Has normals: " + pcd.HasNormals());
System.out.println("Has colors:  " + pcd.HasColors());
System.out.println("Is empty:    " + pcd.IsEmpty());

// Get bounding box
double[] minBound = pcd.GetMinBound();
double[] maxBound = pcd.GetMaxBound();
System.out.printf("Min bound: [%.3f, %.3f, %.3f]%n", minBound[0], minBound[1], minBound[2]);
System.out.printf("Max bound: [%.3f, %.3f, %.3f]%n", maxBound[0], maxBound[1], maxBound[2]);

// Write point cloud to output file
boolean writeSuccess = WritePointCloud(outputFile, pcd);

if (writeSuccess) {
            System.out.println("\nPoint cloud written to: " + outputFile);
} else {
            System.err.println("\nFailed to write point cloud to: " + outputFile);
}

        System.out.println("\n=== Done ===");
}
}

Markdown
复制

package org.open3d.examples;

import org.open3d.*;
import static org.open3d.global.Open3D.*;

/**
 * Example 4: ICP Registration
 *
 * Demonstrates registering two point clouds using Iterative Closest Point (ICP).
 */
public class Example4_ICPRegistration {

public static void main(String[] args) {
if (args.length < 2) {
            System.out.println("Usage: Example4_ICPRegistration <source.ply> <target.ply>");
System.out.println("  Registers the source point cloud to the target using ICP.");
return;
}

        String sourceFile = args[0];
String targetFile = args[1];

System.out.println("=== Open3D JavaCPP - ICP Registration ===");
System.out.println("Source: " + sourceFile);
System.out.println("Target: " + targetFile);

// Read source and target point clouds
PointCloud source = new PointCloud();
PointCloud target = new PointCloud();
boolean s1 = ReadPointCloud(sourceFile, source);
boolean s2 = ReadPointCloud(targetFile, target);
if (!s1 || !s2) {
            System.err.println("Failed to read point clouds.");
return;
}

        System.out.println("\n--- Source: has points = " + source.HasPoints());
System.out.println("--- Target: has points = " + target.HasPoints());

// Estimate normals (needed for point-to-plane ICP)
KDTreeSearchParamKNN searchParam = new KDTreeSearchParamKNN(30);
source.EstimateNormals(searchParam, true);
target.EstimateNormals(searchParam, true);
System.out.println("Normals estimated for both clouds.");

double maxCorrespondenceDistance = 0.05;

// Evaluate initial alignment
System.out.println("\n--- Evaluate Initial Alignment ---");
RegistrationResult evalResult = EvaluateRegistration(source, target, maxCorrespondenceDistance);
System.out.printf("Initial Fitness:     %.6f%n", evalResult.fitness_());
System.out.printf("Initial Inlier RMSE: %.6f%n", evalResult.inlier_rmse_());

// Point-to-point ICP
System.out.println("\n--- Point-to-Point ICP ---");
TransformationEstimationPointToPoint estimationP2P =
                new TransformationEstimationPointToPoint(false);
ICPConvergenceCriteria criteria =
                new ICPConvergenceCriteria(1e-6, 1e-6, 30);

RegistrationResult resultP2P = RegistrationICP(
                source, target, maxCorrespondenceDistance,
new double[]{1,0,0,0, 0,1,0,0, 0,0,1,0, 0,0,0,1}, // identity 4x4
estimationP2P, criteria);

System.out.printf("P2P Fitness:     %.6f%n", resultP2P.fitness_());
System.out.printf("P2P Inlier RMSE: %.6f%n", resultP2P.inlier_rmse_());

// Point-to-plane ICP
System.out.println("\n--- Point-to-Plane ICP ---");
TransformationEstimationPointToPlane estimationP2L =
                new TransformationEstimationPointToPlane();

RegistrationResult resultP2L = RegistrationICP(
                source, target, maxCorrespondenceDistance,
new double[]{1,0,0,0, 0,1,0,0, 0,0,1,0, 0,0,0,1},
                estimationP2L, criteria);

System.out.printf("P2L Fitness:     %.6f%n", resultP2L.fitness_());
System.out.printf("P2L Inlier RMSE: %.6f%n", resultP2L.inlier_rmse_());

System.out.println("\n=== Done ===");
}
}

字少事大，全量编译后的Nvidia 大模型推理部署工具TensorRT-LLM 及开源大模型推理部署工具Llama.CPP 全量编译加入Java大家庭，标志Java从此拥有了高效大模型推理工具的基底，从此大家可以尽情的像使用Spark Flink 一般做大模型推理服务了。

当然Java只是一层皮，实际干活的还是cpp，这要感谢来自天才开源组织ByteDeco 强大的自动化编译工具 JavaCpp，java 作为 cpp on jvm 的使命和亲缘关系，在AI 时代永不会落伍，并可以反哺 Scala 和Kotlin Clojure Groovy等Jvm上的小弟们。
我们将在下一步继续转译 Open3D 和Torch-TensorRT 到java 家庭中

package example;

import org.bytedeco.javacpp.*;
import org.bytedeco.tensorrt_llm.*;
import org.bytedeco.tensorrt_llm.*;
import org.bytedeco.tensorrt_llm.*;
import org.bytedeco.tensorrt_llm.*;
import org.bytedeco.tensorrt_llm.global.TRTLLM;

/**
 * 示例 2: Qwen3 离线批量推理
 *
 * 批量提交多个请求，一次性获取所有结果。适合离线批处理场景。
 *
 * 使用场景:
 * - 大规模数据标注
 * - 离线内容生成
 * - 批量文本摘要
 * - 离线翻译
 */
public class Qwen3BatchInference {

    private static final int EOS_TOKEN_ID = 151645;
    private static final int PAD_TOKEN_ID = 151643;

    public static void main(String[] args) throws Exception {
        String engineDir = args.length > 0 ? args[0] : "/path/to/qwen3-engine";

        System.out.println("=== TensorRT-LLM Qwen3 离线批量推理 ===");

        // ============================================
        // 1. 配置 - 针对批量推理优化
        // ============================================
        KvCacheConfig kvCacheConfig = new KvCacheConfig();
        kvCacheConfig.setEnableBlockReuse(true);
        kvCacheConfig.setFreeGpuMemoryFraction(0.9f); // 批量推理可以用更多内存

        SchedulerConfig schedulerConfig = new SchedulerConfig();
        // GUARANTEED_NO_EVICT: 保证请求不会被中途驱逐，适合离线批处理
        // schedulerConfig.setCapacitySchedulerPolicy(
        //     TRTLLM.kGUARANTEED_NO_EVICT
        // );

        ExecutorConfig executorConfig = new ExecutorConfig();
        executorConfig.setKvCacheConfig(kvCacheConfig);
        executorConfig.setSchedulerConfig(schedulerConfig);
        executorConfig.setEnableChunkedContext(true);
        executorConfig.setMaxBeamWidth(1);

        // ============================================
        // 2. 加载引擎
        // ============================================
        Executor executor = new Executor(
                new BytePointer(engineDir), TRTLLM.ModelType.kDECODER_ONLY, executorConfig
        );
        System.out.println("✅ 引擎加载成功");

        // ============================================
        // 3. 构造批量请求
        // ============================================
        String[] prompts = {
                "请用三句话总结量子力学的核心思想。",
                "Java和Python的主要区别是什么？",
                "写一首关于春天的五言绝句。",
                "解释什么是Transformer架构。",
                "TensorRT-LLM 的主要优势有哪些？"
        };

        // 默认的 SamplingConfig
        SamplingConfig samplingConfig = new SamplingConfig();
        samplingConfig.setTemperature(new FloatPointer(1).put(0.3f));  // 低温度，输出更确定性
        samplingConfig.setTopP(new FloatPointer(1).put(0.95f));

        // 存放所有请求的 ID
        long[] requestIds = new long[prompts.length];

        for (int i = 0; i < prompts.length; i++) {
            // 实际使用中，这里应该用 Qwen3Tokenizer 对 prompt 进行 tokenize
            // 这里用占位 token IDs
            int[] tokenizedPrompt = tokenize(prompts[i]);

            IntPointer inputTokens = new IntPointer(tokenizedPrompt.length);
            for (int j = 0; j < tokenizedPrompt.length; j++) {
                inputTokens.put(j, tokenizedPrompt[j]);
            }

            Request request = new Request(inputTokens, 256);  // 最大生成 256 tokens
            request.setStreaming(false);                       // 非流式，一次返回完整结果
            request.setSamplingConfig(samplingConfig);
            request.setEndId(EOS_TOKEN_ID);
            request.setPadId(PAD_TOKEN_ID);

            // 提交请求
            requestIds[i] = executor.enqueueRequest(request);
            System.out.printf("  提交请求 [%d]: %s... (requestId=%d)%n",
                    i, prompts[i].substring(0, Math.min(20, prompts[i].length())), requestIds[i]);
        }
        System.out.println("\n✅ 所有 " + prompts.length + " 个请求已提交");

        // ============================================
        // 4. 收集所有结果
        // ============================================
        System.out.println("\n等待推理完成...\n");

        int completedCount = 0;
        boolean[] completed = new boolean[prompts.length];

        while (completedCount < prompts.length) {
            // 轮询检查是否有响应
            for (int i = 0; i < prompts.length; i++) {
                if (completed[i]) continue;

                // 检查该请求是否有响应就绪
                LongPointer reqIdPtr = new LongPointer(1);
                reqIdPtr.put(0, requestIds[i]);
                int numReady = executor.getNumResponsesReady(reqIdPtr);

                if (numReady > 0) {
                    // TODO: 调用 awaitResponses 获取结果
                    // Response response = executor.awaitResponses(requestIds[i]).get(0);
                    // Result result = response.getResult();
                    //
                    // 解码输出 tokens:
                    // String output = detokenize(result.getOutputTokenIds());
                    // System.out.printf("--- 请求 [%d] 完成 ---%n", i);
                    // System.out.printf("  输入: %s%n", prompts[i]);
                    // System.out.printf("  输出: %s%n%n", output);

                    completed[i] = true;
                    completedCount++;
                    System.out.printf("  ✅ 请求 [%d] 完成 (requestId=%d)%n", i, requestIds[i]);
                }
            }

            if (completedCount < prompts.length) {
                Thread.sleep(50); // 等待 50ms 再检查
            }
        }

        System.out.println("\n✅ 所有请求完成! 共处理 " + prompts.length + " 个请求");

        // ============================================
        // 5. 关闭
        // ============================================
        executor.shutdown();
        System.out.println("✅ Executor 已关闭");
    }

    /**
     * 模拟 tokenize - 实际使用中请调用 Qwen3 Tokenizer
     * 推荐使用 HuggingFace tokenizers 的 Java 绑定
     * 例如: https://github.com/huggingface/tokenizers (Rust + JNI)
     */
    private static int[] tokenize(String text) {
        // 占位实现 - 实际应使用 Qwen3 的 tokenizer
        // 可以通过以下方式实现:
        // 1. 使用 tokenizers-java (HuggingFace 的 Rust tokenizer 的 Java 绑定)
        // 2. 使用 Python subprocess 调用 transformers tokenizer
        // 3. 使用 JNI 调用 sentencepiece
        return new int[]{151644, 882, 198, 100, 200, 300, 151645, 198, 151644, 77091, 198};
    }
}

https://edu.csdn.net/learn/39067/627173?utm_source=2019755004

汇聚全球AI编程工具，助力开发者即刻编程。

更多推荐

Memory——让 AI 助手跨会话记住你的偏好

AI编程社区

MC-038 | 多模型协作：让不同模型各司其职

AI编程社区

2026实测｜Claude Code平价替代深度对比，国产AI原生IDE平替方案

我有个不太主流的对比维度：AI 编程工具生成的代码好不好读。有些工具写出来的代码只有机器能理解。5 款对比。刚毕业做全栈开发半年，日常靠vibe coding完成Flask后端接口开发，前段时间公司缩减云服务预算，每月几百美元的Claude Code账单压力陡增，开始系统性寻找平价替代。最先上手的就是字节跳动出品TRAE，据公开报道，已有大量国内开发者用户在使用TRAE，它基础版免费，不用一上来就