SAUCE
Home
Events
Listing
Future
Previous
Accelerated Computing with GPUs 2020
Data Mining - Winter 20/21
High Performance Computing 2019
Einführung in die Bioinformatik WS19/20
Computational Logic
Parallel Algorithms and Architectures 2019
DSEA Praktikum 2018/19
Deep Learning 2018
High Performance Computing 2018
Parallel Algorithms and Architectures 2018
Datenstrukturen und effiziente Algorithmen Ws 18/19
EiP SoSe 18
bio-st-18
EiP WS 2017/18
High Performance Computing 2017
Datenstrukturen und effiziente Algorithmen WiSe 17/18
PS SS 2017
Einfuehrung in die Programmierung SS17
Parallel Algorithms and Architectures 2017
High Performance Computing 2016
DSEA 2016/17
EiP WS2016/17
Parallel Algorithms and Architectures 2016
PS SS 2016
Krypto SS 2016
EiP SS 2016
DSEA Praktikum WS 2015/16
DSEA WS 2015/16
News
Documentation
About
Changelog
Roadmap
Deutsche Dokumentation
Tips and Tricks
Test configuration
Language information
Contact
Login
High Performance Computing 2018
Sheet 3 (Stochastic PI, Shallow Deep Learning)
Pair Programming
Sheet 2 (AVX Shuffles, Instruction Parallelism)
Sheet 3 (Stochastic PI, Shallow Deep Learning)
Sheet 4 (Max-Pooling, Asynchronous 2D Jacobi Partitioning)
Sheet 5 (std::async, block-cyclic distribution)
Sheet 6 (Atomics, Knapsack)
Sheet 7 (Sorting, Riemann Zeta)
Sheet 8 (Data Dependencies, Task Parallelism)
Sheet 9 (Reverse-Engineering MPI, SUMMA)
Sheet 11 (Position Based Dynamics)
Sheet 12: Lockfree Hashmaps
Task 2 (Softmax Regression)
Task 1 (Stochastic PI)
Task 2 (Softmax Regression)
Task 2 (Softmax Regression)
Assignment
Scaffold Head
#include "include/hpc_helpers.hpp" // timers #include "include/binary_IO.hpp" // load images #include <limits> // numerical limits of data types #include <vector> // std::vector #include <cmath> // std::max #include <mpi.h> // mpi support // softmax forward pass template < typename value_t, typename index_t> void softmax_regression( value_t * input, value_t * output, value_t * weights, value_t * bias, index_t n_input, index_t n_output) { for (index_t j = 0; j < n_output; j++) { value_t accum = value_t(0); for (index_t k = 0; k < n_input; k++) accum += weights[j*n_input+k]*input[k]; output[j] = accum + bias[j]; } value_t norm = value_t(0); value_t mu = std::numeric_limits<value_t>::lowest(); // compute mu = max(z_j) for (index_t index = 0; index < n_output; index++) mu = std::max(mu, output[index]); // compute y_j = exp(z_j-mu) for (index_t j = 0; j < n_output; j++) output[j] = std::exp(output[j]-mu); // compute Z = sum_j z_j for (index_t j = 0; j < n_output; j++) norm += output[j]; // compute z_j/Z for (index_t j = 0; j < n_output; j++) output[j] /= norm; } // argmax function template < typename value_t, typename index_t> index_t argmax( value_t * neurons, index_t n_units) { index_t arg = 0; value_t max = std::numeric_limits<value_t>::lowest(); for (index_t j = 0; j < n_units; j++) { const value_t val = neurons[j]; if (val > max) { arg = j; max = val; } } return arg; } // determine classifcation error template < typename value_t, typename index_t> index_t correctly_classified ( value_t * input, value_t * label, value_t * weights, value_t * bias, index_t num_entries, index_t num_features, index_t num_classes) { index_t counter = index_t(0); for (index_t i= 0; i < num_entries; i++) { value_t output[num_classes]; const uint64_t input_off = i*num_features; const uint64_t label_off = i*num_classes; softmax_regression(input+input_off, output, weights, bias, num_features, num_classes); counter += argmax(output, num_classes) == argmax(label+label_off, num_classes); } return counter; } int main(int argc, char * argv[]) { MPI::Init(argc , argv); const uint64_t num_ranks = MPI::COMM_WORLD.Get_size(); const uint64_t rank = MPI::COMM_WORLD.Get_rank(); const uint64_t num_features = 28*28; // number of pixels const uint64_t num_classes = 10; // number of classes const uint64_t num_entries = 65000; // number of images const bool is_root = rank == 0; // am I the root processor // memory is only allocated for the root node std::vector<float> input(num_entries*num_features*is_root); std::vector<float> label(num_entries*num_classes*is_root); std::vector<float> weights(num_classes*num_features); std::vector<float> bias(num_classes); // data is exclusively loaded on the root node if (is_root) { load_binary(input.data(), input.size(), "./data/X.bin"); load_binary(label.data(), label.size(), "./data/Y.bin"); load_binary(weights.data(), weights.size(), "./data/A.bin"); load_binary(bias.data(), bias.size(), "./data/b.bin"); }
Scaffold Foot
// accuracies are exclusively reported on the root node if (is_root) { std::cout << "# classification accuracy: " << double(count)/num_entries; std::cout << "\nParallel programming is " << (count == 60174 ? "fun!" : "error-prone!") << std::endl; } MPI::Finalize(); }
Start time:
Mo 22 Okt 2018 10:51:00
End time:
Fr 01 Mär 2019 12:00:00
General test timeout:
10.0 seconds
Tests
Command line arguments
4
Comment prefix
#
Given input
Expected output
Parallel programming is fun!