SAUCE
Home
Events
Listing
Future
Previous
Accelerated Computing with GPUs 2020
Data Mining - Winter 20/21
High Performance Computing 2019
Einführung in die Bioinformatik WS19/20
Computational Logic
Parallel Algorithms and Architectures 2019
DSEA Praktikum 2018/19
Deep Learning 2018
High Performance Computing 2018
Parallel Algorithms and Architectures 2018
Datenstrukturen und effiziente Algorithmen Ws 18/19
EiP SoSe 18
bio-st-18
EiP WS 2017/18
High Performance Computing 2017
Datenstrukturen und effiziente Algorithmen WiSe 17/18
PS SS 2017
Einfuehrung in die Programmierung SS17
Parallel Algorithms and Architectures 2017
High Performance Computing 2016
DSEA 2016/17
EiP WS2016/17
Parallel Algorithms and Architectures 2016
PS SS 2016
Krypto SS 2016
EiP SS 2016
DSEA Praktikum WS 2015/16
DSEA WS 2015/16
News
Documentation
About
Changelog
Roadmap
Deutsche Dokumentation
Tips and Tricks
Test configuration
Language information
Contact
Login
Parallel Algorithms and Architectures 2018
Sheet 6 (Warp Intrinsics)
Interactive Programming Exercises in the Lecture
Sheet 3
Sheet 4 (Array Reversal, Determinants)
Sheet 5 (Knapsack)
Sheet 6 (Warp Intrinsics)
Sheet 7 (Sparse Matrices, Page Rank)
Sheet 8 (Streams, Multi-GPU)
Sheet 9 (Jacobi Iteration)
Task 1 (Segmented Prefix Scan)
Task 1 (Segmented Prefix Scan)
Task 2 (Aggregated Atomics)
Task 1 (Segmented Prefix Scan)
Assignment
Scaffold Head
#include <iostream> #include <cstdint> #include <assert.h> /////////////////////////////////////////////////////////////////////////////// // IGNORE THESE HELPERS (taken from https://github.com/gravitino/cudahelpers) /////////////////////////////////////////////////////////////////////////////// // safe division #define SDIV(x,y)(((x)+(y)-1)/(y)) // error makro #define CUERR { \ cudaError_t err; \ if ((err = cudaGetLastError()) != cudaSuccess) { \ std::cout << "CUDA error: " << cudaGetErrorString(err) << " : " \ << __FILE__ << ", line " << __LINE__ << std::endl; \ exit(1); \ } \ } // convenient timers #define TIMERSTART(label) \ cudaEvent_t start##label, stop##label; \ float time##label; \ cudaEventCreate(&start##label); \ cudaEventCreate(&stop##label); \ cudaEventRecord(start##label, 0); #define TIMERSTOP(label) \ cudaEventRecord(stop##label, 0); \ cudaEventSynchronize(stop##label); \ cudaEventElapsedTime(&time##label, start##label, stop##label); \ std::cout << "#" << time##label \ << " ms (" << #label << ")" << std::endl; template <typename index_t, typename value_t> __global__ void iota_kernel(value_t * Data, index_t n) { const uint64_t thid = blockDim.x*blockIdx.x+threadIdx.x; const uint64_t stride = blockDim.x*gridDim.x; for (uint64_t index = thid; index < n; index += stride) Data[index] = index; }
Scaffold Foot
int main () { const uint64_t n = 1UL << 28; typedef uint64_t value_t; assert(n % 32 == 0); value_t * Data = nullptr, * result = nullptr, * Result = nullptr; cudaMalloc (&Data, sizeof(value_t)*n); CUERR cudaMalloc (&Result, sizeof(value_t)*n); CUERR cudaMallocHost(&result, sizeof(value_t)*n); CUERR TIMERSTART(iota_kernel) iota_kernel<<<1024,1024>>>(Data, n); CUERR TIMERSTOP(iota_kernel) TIMERSTART(segmented_prefix_scan) segmented_prefix_scan<<<1024, 1024>>>(Data, Result, n); TIMERSTOP(segmented_prefix_scan) TIMERSTART(D2H) cudaMemcpy(result, Result, sizeof(value_t)*n, cudaMemcpyDeviceToHost); CUERR TIMERSTOP(D2H) for (uint64_t index = 0; index < n; index++) { const value_t lower = (index/32)*32; const value_t offset = lower*(lower-1)/2; const value_t uncorrected = index*(index+1)/2; if(result[index] != uncorrected-offset) { std::cout << "error at position: " << index << std::endl; break; } } cudaFree(Data); CUERR cudaFree(Result); CUERR cudaFreeHost(result); CUERR std::cout << "CUDA programming is fun!" << std::endl; }
Start time:
Do 12 Apr 2018 16:11:00
End time:
Fr 01 Mär 2019 12:00:00
General test timeout:
10.0 seconds
Tests
Comment prefix
#
Given input
Expected output
CUDA programming is fun!