关于【cuda】的问题- 第1页

Володимир Дідик

Asked: 2024-11-05 03:51:47 +0000 UTC

每个 GPU 核心的程序大小

7

在 CUDA 环境中进行编程。（6 场讲座） https://www.youtube.com/watch?v=Oqebkc0NO_8

我把6个讲座都看完了，但还是没有弄清楚GPU核的程序存放在GPU内存的什么地方，以及它的大小有什么限制。我们到处都只谈论数据存储器。数据被发送到那里，数据被发送到船舶。 adra程序本身不发送到GPU吗？有 0 条与此相关的信息。

请有人分享您的知识。

alvahtin

Asked: 2024-03-07 03:18:14 +0000 UTC

如何使用 CUDA 代码创建 .so 库？

6

我正在尝试使用适用于 Linux 的 CUDA 代码创建一个动态库。我正在尝试编译：

nvcc CudaLib.cu -shared -o CudaLib.so -arch=native

我收到错误：

创建共享对象时不能使用 move R_X86_64_32S for ".rodata"；使用 -fPIC 选项重新编译

显然编译时需要-fPIC参数，但nvcc中不支持该参数。如果你写

nvcc CudaLib.cu -shared -fPIC -o CudaLib.so -arch=native

那么就会出现错误：

未知选项“-fPIC”

Alexander Pozharskii

Asked: 2020-01-14 04:27:05 +0000 UTC

如何优化在 theano 上使用池化/非池化索引？

1

实际上，任务是尽可能准确地复制theano 上 SpatialMaxPooling 和 SpatialMaxUnpooling 层的行为。

在这种情况下，SpatialMaxUnpooling 只填充对应于相应 SpatialMaxPooling 中最大值索引的那些“单元格”。

例如 - 这是输入图像

SpatialMaxPooling 将存储每个 2x2 区域中具有最大值的像素及其索引。

而 SpatialMaxUnpooling - 只会将值设置为与索引对应的那些像素。也就是说，输出将是

我发布了以下实现：

def pooling2d_2x2(self, x):
    reshaped = x.reshape([
        x.shape[0], x.shape[1], x.shape[2] // 2, 2, x.shape[3] // 2, 2
    ])
    max_values, max_indices = T.max_and_argmax(reshaped, (3,5,))
    return max_values, max_indices

def unpooling2d_2x2(self, pooled, indices):
    tmp_shape = [pooled.shape[0], pooled.shape[1], pooled.shape[2], 2, pooled.shape[3], 2]
    # Resize image
    resized = pooled.repeat(2, 2).repeat(2, 3)
    pooled_reshaped = resized.reshape(tmp_shape)
    # Resize indices
    indices_repeaten = indices.repeat(2, 2).repeat(2, 3).reshape(tmp_shape)
    # Calculate output
    result = pooled_reshaped * 0.0
    result = T.set_subtensor(result[:, :, :, 0, :, 0],
                             pooled_reshaped[:, :, :, 0, :, 0] * T.eq(indices_repeaten[:, :, :, 0, :, 0], 0))
    result = T.set_subtensor(result[:, :, :, 0, :, 1],
                             pooled_reshaped[:, :, :, 0, :, 1] * T.eq(indices_repeaten[:, :, :, 0, :, 1], 1))
    result = T.set_subtensor(result[:, :, :, 1, :, 0],
                             pooled_reshaped[:, :, :, 1, :, 0] * T.eq(indices_repeaten[:, :, :, 1, :, 0], 2))
    result = T.set_subtensor(result[:, :, :, 1, :, 1],
                             pooled_reshaped[:, :, :, 1, :, 1] * T.eq(indices_repeaten[:, :, :, 1, :, 1], 3))
    result_shape = [pooled.shape[0], pooled.shape[1], pooled.shape[2] * 2, pooled.shape[3] * 2]
    return result.reshape(result_shape)

但她在速度上并不出色（顺便说一句，我不会拒绝建议 - 如何配置文件）。因此问题 - 这里可以改进什么？

每个 GPU 核心的程序大小

如何使用 CUDA 代码创建 .so 库？

如何优化在 theano 上使用池化/非池化索引？

我看不懂措辞

请求的模块“del”不提供名为“default”的导出

"!+tab" 在 HTML 的 vs 代码中不起作用

我正在尝试解决“猜词”的问题。Python

可以使用哪些命令将当前指针移动到指定的提交而不更改工作目录中的文件？

Python解析野莓

问题：“警告：检查最新版本的 pip 时出错。”

帮助编写一个用值填充变量的循环。解决这个问题

尽管依赖数组为空，但在渲染上调用了 2 次 useEffect

数据不通过 Telegram.WebApp.sendData 发送

问题[cuda]