【Java深度学习】PyTorch On Java 系列课程 第十六章 32 :PyTorch Java生态扩展llama.cpp TensorRT-LLM[PyTorch Java 硕士研一课程]

做过自动驾驶的都知道Open3D ,但是它现在只支持Python 和cpp ,虽然它后端是cpp写的,我之前曾给他们github 官方仓库提issue 希望他们支持java,但是他们很决绝的答复我:【java 从来不在我们的考虑范围之内!我们永远也不会开发java 的sdk ! 凭什么让我们来支持Java】 这大概就是java 的尴尬地位,但是我们有javacpp 这一杀手【贱】的存在,你不支持没关系,只要你的代码是开源且是cpp 写的,我就能让你被迫支持上java。这不 我们最近的实验就是使用javacpp 利用Claude 大模型,把Open3D 的底裤给扒下来了,从此我们在jvm 平台也可以用上 Open3D了,当然我们也很贱,又跑到Open3d 原issue 回复官方,我们已经让 Open3D的最新版本支持上了 Java!
未来希望更多的朋友能够使用javacpp 和claude 来把一些cpp 重要的工具转译到java 生态中,这种造轮子是值得提倡去做了,极大的丰富和对齐了java在一些高性能工具上的不足。
我们给大家看几个使用示例
1
Markdown
复制
package org.open3d.examples;
import org.open3d.*;
import static org.open3d.global.Open3D.*;
/**
* Example 1: Read and Write Point Cloud
*
* Demonstrates loading a point cloud from a file (PLY/PCD/XYZ),
* inspecting basic properties, and writing it back to a different format.
*/
public class Example1_ReadWritePointCloud {
public static void main(String[] args) {
if (args.length < 1) {
System.out.println("Usage: Example1_ReadWritePointCloud <input_file> [output_file]");
System.out.println(" Supported formats: .ply, .pcd, .xyz, .xyzn, .xyzrgb, .pts");
System.out.println("\nExample:");
System.out.println(" java Example1_ReadWritePointCloud bunny.ply output.pcd");
return;
}
String inputFile = args[0];
String outputFile = args.length > 1 ? args[1] : "output.ply";
System.out.println("=== Open3D JavaCPP - Read/Write Point Cloud ===");
System.out.println("Input: " + inputFile);
System.out.println("Output: " + outputFile);
// Read point cloud from file
PointCloud pcd = new PointCloud();
boolean readSuccess = ReadPointCloud(inputFile, pcd);
if (!readSuccess) {
System.err.println("Failed to read point cloud from: " + inputFile);
return;
}
// Print basic info
System.out.println("\n--- Point Cloud Info ---");
System.out.println("Has points: " + pcd.HasPoints());
System.out.println("Has normals: " + pcd.HasNormals());
System.out.println("Has colors: " + pcd.HasColors());
System.out.println("Is empty: " + pcd.IsEmpty());
// Get bounding box
double[] minBound = pcd.GetMinBound();
double[] maxBound = pcd.GetMaxBound();
System.out.printf("Min bound: [%.3f, %.3f, %.3f]%n", minBound[0], minBound[1], minBound[2]);
System.out.printf("Max bound: [%.3f, %.3f, %.3f]%n", maxBound[0], maxBound[1], maxBound[2]);
// Write point cloud to output file
boolean writeSuccess = WritePointCloud(outputFile, pcd);
if (writeSuccess) {
System.out.println("\nPoint cloud written to: " + outputFile);
} else {
System.err.println("\nFailed to write point cloud to: " + outputFile);
}
System.out.println("\n=== Done ===");
}
}
Markdown
复制
package org.open3d.examples;
import org.open3d.*;
import static org.open3d.global.Open3D.*;
/**
* Example 4: ICP Registration
*
* Demonstrates registering two point clouds using Iterative Closest Point (ICP).
*/
public class Example4_ICPRegistration {
public static void main(String[] args) {
if (args.length < 2) {
System.out.println("Usage: Example4_ICPRegistration <source.ply> <target.ply>");
System.out.println(" Registers the source point cloud to the target using ICP.");
return;
}
String sourceFile = args[0];
String targetFile = args[1];
System.out.println("=== Open3D JavaCPP - ICP Registration ===");
System.out.println("Source: " + sourceFile);
System.out.println("Target: " + targetFile);
// Read source and target point clouds
PointCloud source = new PointCloud();
PointCloud target = new PointCloud();
boolean s1 = ReadPointCloud(sourceFile, source);
boolean s2 = ReadPointCloud(targetFile, target);
if (!s1 || !s2) {
System.err.println("Failed to read point clouds.");
return;
}
System.out.println("\n--- Source: has points = " + source.HasPoints());
System.out.println("--- Target: has points = " + target.HasPoints());
// Estimate normals (needed for point-to-plane ICP)
KDTreeSearchParamKNN searchParam = new KDTreeSearchParamKNN(30);
source.EstimateNormals(searchParam, true);
target.EstimateNormals(searchParam, true);
System.out.println("Normals estimated for both clouds.");
double maxCorrespondenceDistance = 0.05;
// Evaluate initial alignment
System.out.println("\n--- Evaluate Initial Alignment ---");
RegistrationResult evalResult = EvaluateRegistration(source, target, maxCorrespondenceDistance);
System.out.printf("Initial Fitness: %.6f%n", evalResult.fitness_());
System.out.printf("Initial Inlier RMSE: %.6f%n", evalResult.inlier_rmse_());
// Point-to-point ICP
System.out.println("\n--- Point-to-Point ICP ---");
TransformationEstimationPointToPoint estimationP2P =
new TransformationEstimationPointToPoint(false);
ICPConvergenceCriteria criteria =
new ICPConvergenceCriteria(1e-6, 1e-6, 30);
RegistrationResult resultP2P = RegistrationICP(
source, target, maxCorrespondenceDistance,
new double[]{1,0,0,0, 0,1,0,0, 0,0,1,0, 0,0,0,1}, // identity 4x4
estimationP2P, criteria);
System.out.printf("P2P Fitness: %.6f%n", resultP2P.fitness_());
System.out.printf("P2P Inlier RMSE: %.6f%n", resultP2P.inlier_rmse_());
// Point-to-plane ICP
System.out.println("\n--- Point-to-Plane ICP ---");
TransformationEstimationPointToPlane estimationP2L =
new TransformationEstimationPointToPlane();
RegistrationResult resultP2L = RegistrationICP(
source, target, maxCorrespondenceDistance,
new double[]{1,0,0,0, 0,1,0,0, 0,0,1,0, 0,0,0,1},
estimationP2L, criteria);
System.out.printf("P2L Fitness: %.6f%n", resultP2L.fitness_());
System.out.printf("P2L Inlier RMSE: %.6f%n", resultP2L.inlier_rmse_());
System.out.println("\n=== Done ===");
}
}
字少事大,全量编译后的Nvidia 大模型推理部署工具TensorRT-LLM 及开源 大模型推理部署工具Llama.CPP 全量编译加入Java大家庭,标志Java从此拥有了高效大模型推理工具的基底,从此大家可以尽情的像使用Spark Flink 一般 做大模型推理服务了。
当然Java只是一层皮,实际干活的还是cpp,这要感谢来自天才开源组织ByteDeco 强大的自动化编译工具 JavaCpp,java 作为 cpp on jvm 的使命和亲缘关系,在AI 时代永不会落伍,并可以反哺 Scala 和Kotlin Clojure Groovy等Jvm上的小弟们。
我们将在下一步继续 转译 Open3D 和Torch-TensorRT 到java 家庭中
package example;
import org.bytedeco.javacpp.*;
import org.bytedeco.tensorrt_llm.*;
import org.bytedeco.tensorrt_llm.*;
import org.bytedeco.tensorrt_llm.*;
import org.bytedeco.tensorrt_llm.*;
import org.bytedeco.tensorrt_llm.global.TRTLLM;
/**
* 示例 2: Qwen3 离线批量推理
*
* 批量提交多个请求,一次性获取所有结果。适合离线批处理场景。
*
* 使用场景:
* - 大规模数据标注
* - 离线内容生成
* - 批量文本摘要
* - 离线翻译
*/
public class Qwen3BatchInference {
private static final int EOS_TOKEN_ID = 151645;
private static final int PAD_TOKEN_ID = 151643;
public static void main(String[] args) throws Exception {
String engineDir = args.length > 0 ? args[0] : "/path/to/qwen3-engine";
System.out.println("=== TensorRT-LLM Qwen3 离线批量推理 ===");
// ============================================
// 1. 配置 - 针对批量推理优化
// ============================================
KvCacheConfig kvCacheConfig = new KvCacheConfig();
kvCacheConfig.setEnableBlockReuse(true);
kvCacheConfig.setFreeGpuMemoryFraction(0.9f); // 批量推理可以用更多内存
SchedulerConfig schedulerConfig = new SchedulerConfig();
// GUARANTEED_NO_EVICT: 保证请求不会被中途驱逐,适合离线批处理
// schedulerConfig.setCapacitySchedulerPolicy(
// TRTLLM.kGUARANTEED_NO_EVICT
// );
ExecutorConfig executorConfig = new ExecutorConfig();
executorConfig.setKvCacheConfig(kvCacheConfig);
executorConfig.setSchedulerConfig(schedulerConfig);
executorConfig.setEnableChunkedContext(true);
executorConfig.setMaxBeamWidth(1);
// ============================================
// 2. 加载引擎
// ============================================
Executor executor = new Executor(
new BytePointer(engineDir), TRTLLM.ModelType.kDECODER_ONLY, executorConfig
);
System.out.println("✅ 引擎加载成功");
// ============================================
// 3. 构造批量请求
// ============================================
String[] prompts = {
"请用三句话总结量子力学的核心思想。",
"Java和Python的主要区别是什么?",
"写一首关于春天的五言绝句。",
"解释什么是Transformer架构。",
"TensorRT-LLM 的主要优势有哪些?"
};
// 默认的 SamplingConfig
SamplingConfig samplingConfig = new SamplingConfig();
samplingConfig.setTemperature(new FloatPointer(1).put(0.3f)); // 低温度,输出更确定性
samplingConfig.setTopP(new FloatPointer(1).put(0.95f));
// 存放所有请求的 ID
long[] requestIds = new long[prompts.length];
for (int i = 0; i < prompts.length; i++) {
// 实际使用中,这里应该用 Qwen3Tokenizer 对 prompt 进行 tokenize
// 这里用占位 token IDs
int[] tokenizedPrompt = tokenize(prompts[i]);
IntPointer inputTokens = new IntPointer(tokenizedPrompt.length);
for (int j = 0; j < tokenizedPrompt.length; j++) {
inputTokens.put(j, tokenizedPrompt[j]);
}
Request request = new Request(inputTokens, 256); // 最大生成 256 tokens
request.setStreaming(false); // 非流式,一次返回完整结果
request.setSamplingConfig(samplingConfig);
request.setEndId(EOS_TOKEN_ID);
request.setPadId(PAD_TOKEN_ID);
// 提交请求
requestIds[i] = executor.enqueueRequest(request);
System.out.printf(" 提交请求 [%d]: %s... (requestId=%d)%n",
i, prompts[i].substring(0, Math.min(20, prompts[i].length())), requestIds[i]);
}
System.out.println("\n✅ 所有 " + prompts.length + " 个请求已提交");
// ============================================
// 4. 收集所有结果
// ============================================
System.out.println("\n等待推理完成...\n");
int completedCount = 0;
boolean[] completed = new boolean[prompts.length];
while (completedCount < prompts.length) {
// 轮询检查是否有响应
for (int i = 0; i < prompts.length; i++) {
if (completed[i]) continue;
// 检查该请求是否有响应就绪
LongPointer reqIdPtr = new LongPointer(1);
reqIdPtr.put(0, requestIds[i]);
int numReady = executor.getNumResponsesReady(reqIdPtr);
if (numReady > 0) {
// TODO: 调用 awaitResponses 获取结果
// Response response = executor.awaitResponses(requestIds[i]).get(0);
// Result result = response.getResult();
//
// 解码输出 tokens:
// String output = detokenize(result.getOutputTokenIds());
// System.out.printf("--- 请求 [%d] 完成 ---%n", i);
// System.out.printf(" 输入: %s%n", prompts[i]);
// System.out.printf(" 输出: %s%n%n", output);
completed[i] = true;
completedCount++;
System.out.printf(" ✅ 请求 [%d] 完成 (requestId=%d)%n", i, requestIds[i]);
}
}
if (completedCount < prompts.length) {
Thread.sleep(50); // 等待 50ms 再检查
}
}
System.out.println("\n✅ 所有请求完成! 共处理 " + prompts.length + " 个请求");
// ============================================
// 5. 关闭
// ============================================
executor.shutdown();
System.out.println("✅ Executor 已关闭");
}
/**
* 模拟 tokenize - 实际使用中请调用 Qwen3 Tokenizer
* 推荐使用 HuggingFace tokenizers 的 Java 绑定
* 例如: https://github.com/huggingface/tokenizers (Rust + JNI)
*/
private static int[] tokenize(String text) {
// 占位实现 - 实际应使用 Qwen3 的 tokenizer
// 可以通过以下方式实现:
// 1. 使用 tokenizers-java (HuggingFace 的 Rust tokenizer 的 Java 绑定)
// 2. 使用 Python subprocess 调用 transformers tokenizer
// 3. 使用 JNI 调用 sentencepiece
return new int[]{151644, 882, 198, 100, 200, 300, 151645, 198, 151644, 77091, 198};
}
}
更多推荐



所有评论(0)