C++ Weekly With Jason Turner
C++ Weekly With Jason Turner
  • Видео 508
  • Просмотров 8 169 596
C++ Weekly - Ep 435 - Easy GPU Programming With AdaptiveCpp (68x Faster!)
☟☟ Awesome T-Shirts! Sponsors! Books! ☟☟
Upcoming Workshop: Understanding Object Lifetime, C++ On Sea, July 2, 2024
► cpponsea.uk/2024/sessions/understanding-object-lifetime-for-efficient-and-safer-cpp.html
Upcoming Workshop: C++ Best Practices, NDC TechTown, Sept 9-10, 2024
► ndctechtown.com/workshops/c-best-practices/4ceb8f7cf86c
Upcoming Workshop: Applied constexpr: The Power of Compile-Time Resources, C++ Under The Sea, October 10, 2024
► cppunderthesea.nl/workshops/
Episode details: github.com/lefticus/cpp_weekly/issues/26
Code Sample: github.com/lefticus/cpp_weekly/blob/master/parallel_algorithms/game_of_life.cpp
T-SHIRTS AVAILABLE!
► The best C++ T-Shirts anywhere! my-store-d16a2f.creator-sp...
Просмотров: 8 443

Видео

C++ Weekly - Ep 434 - GCC's Amazing NEW (2024) -Wnrvo
Просмотров 13 тыс.19 часов назад
☟☟ Awesome T-Shirts! Sponsors! Books! ☟☟ Upcoming Workshop: Understanding Object Lifetime, C On Sea, July 2, 2024 ► cpponsea.uk/2024/sessions/understanding-object-lifetime-for-efficient-and-safer-cpp.html Upcoming Workshop: C Best Practices, NDC TechTown, Sept 9-10, 2024 ► ndctechtown.com/workshops/c-best-practices/4ceb8f7cf86c Upcoming Workshop: Applied constexpr: The Power of Compile-Time Res...
C++ Weekly - Ep 433 - C++'s First New Floating Point Types in 40 Years!
Просмотров 15 тыс.14 дней назад
☟☟ Awesome T-Shirts! Sponsors! Books! ☟☟ Upcoming Workshop: Understanding Object Lifetime, C On Sea, July 2, 2024 ► cpponsea.uk/2024/sessions/understanding-object-lifetime-for-efficient-and-safer-cpp.html Upcoming Workshop: C Best Practices, NDC TechTown, Sept 9-10, 2024 ► ndctechtown.com/workshops/c-best-practices/4ceb8f7cf86c Upcoming Workshop: Applied constexpr: The Power of Compile-Time Res...
C++ Weekly - Ep 432 - Why constexpr Matters
Просмотров 13 тыс.21 день назад
☟☟ Awesome T-Shirts! Sponsors! Books! ☟☟ Upcoming Workshop: Understanding Object Lifetime, C On Sea, July 2, 2024 ► cpponsea.uk/2024/sessions/understanding-object-lifetime-for-efficient-and-safer-cpp.html Upcoming Workshop: C Best Practices, NDC TechTown, Sept 9-10, 2024 ► ndctechtown.com/workshops/c-best-practices/4ceb8f7cf86c Upcoming Workshop: Applied constexpr: The Power of Compile-Time Res...
Travel Vancouver, BC, Canada: C++ Training Travelog
Просмотров 1,6 тыс.28 дней назад
☟☟ Awesome T-Shirts! Sponsors! Books! ☟☟ Upcoming Workshop: Understanding Object Lifetime, C On Sea, July 2, 2024 ► cpponsea.uk/2024/sessions/understanding-object-lifetime-for-efficient-and-safer-cpp.html Upcoming Workshop: C Best Practices, NDC TechTown, Sept 9-10, 2024 ► ndctechtown.com/workshops/c-best-practices/4ceb8f7cf86c Upcoming Workshop: Applied constexpr: The Power of Compile-Time Res...
C++ Weekly - Ep 431 - CTAD for NTTP
Просмотров 8 тыс.28 дней назад
☟☟ Awesome T-Shirts! Sponsors! Books! ☟☟ Upcoming Workshop: Understanding Object Lifetime, C On Sea, July 2, 2024 ► cpponsea.uk/2024/sessions/understanding-object-lifetime-for-efficient-and-safer-cpp.html Upcoming Workshop: C Best Practices, NDC TechTown, Sept 9-10, 2024 ► ndctechtown.com/workshops/c-best-practices/4ceb8f7cf86c Upcoming Workshop: Applied constexpr: The Power of Compile-Time Res...
C++ Weekly - Ep 430 - How Short String Optimizations Work
Просмотров 14 тыс.Месяц назад
☟☟ Awesome T-Shirts! Sponsors! Books! ☟☟ Upcoming Workshop: Understanding Object Lifetime, C On Sea, July 2, 2024 ► cpponsea.uk/2024/sessions/understanding-object-lifetime-for-efficient-and-safer-cpp.html Upcoming Workshop: C Best Practices, NDC TechTown, Sept 9-10, 2024 ► ndctechtown.com/workshops/c-best-practices/4ceb8f7cf86c Upcoming Workshop: Applied constexpr: The Power of Compile-Time Res...
C++ Weekly - Ep 429 - C++26's Parameter Pack Indexing
Просмотров 9 тыс.Месяц назад
☟☟ Awesome T-Shirts! Sponsors! Books! ☟☟ Upcoming Workshop: Understanding Object Lifetime, C On Sea, July 2, 2024 ► cpponsea.uk/2024/sessions/understanding-object-lifetime-for-efficient-and-safer-cpp.html Upcoming Workshop: C Best Practices, NDC TechTown, Sept 9-10, 2024 ► ndctechtown.com/workshops/c-best-practices/4ceb8f7cf86c Upcoming Workshop: Applied constexpr: The Power of Compile-Time Res...
C++ Weekly - Ep 428 - C++23's Coroutine Support: std::generator
Просмотров 8 тыс.Месяц назад
☟☟ Awesome T-Shirts! Sponsors! Books! ☟☟ Upcoming Workshop: Understanding Object Lifetime, C On Sea, July 2, 2024 ► cpponsea.uk/2024/sessions/understanding-object-lifetime-for-efficient-and-safer-cpp.html Upcoming Workshop: C Best Practices, NDC TechTown, Sept 9-10, 2024 ► ndctechtown.com/workshops/c-best-practices/4ceb8f7cf86c Upcoming Workshop: Applied constexpr: The Power of Compile-Time Res...
C++ Weekly - Ep 427 - Simple Generators Without Coroutines
Просмотров 10 тыс.Месяц назад
☟☟ Awesome T-Shirts! Sponsors! Books! ☟☟ Upcoming Workshop: Understanding Object Lifetime, C On Sea, July 2, 2024 ► cpponsea.uk/2024/sessions/understanding-object-lifetime-for-efficient-and-safer-cpp.html Upcoming Workshop: C Best Practices, NDC TechTown, Sept 9-10, 2024 ► ndctechtown.com/workshops/c-best-practices/4ceb8f7cf86c Upcoming Workshop: Applied constexpr: The Power of Compile-Time Res...
C++ Weekly - Ep 426 - Lambdas As State Machines
Просмотров 14 тыс.2 месяца назад
☟☟ Awesome T-Shirts! Sponsors! Books! ☟☟ Upcoming Workshop: Understanding Object Lifetime, C On Sea, July 2, 2024 ► cpponsea.uk/2024/sessions/understanding-object-lifetime-for-efficient-and-safer-cpp.html Upcoming Workshop: C Best Practices, NDC TechTown, Sept 9-10, 2024 ► ndctechtown.com/workshops/c-best-practices/4ceb8f7cf86c Upcoming Workshop: Applied constexpr: The Power of Compile-Time Res...
C++ Weekly - Ep 425 - Using string_view, span, and Pointers Safely!
Просмотров 12 тыс.2 месяца назад
☟☟ Awesome T-Shirts! Sponsors! Books! ☟☟ Upcoming Workshop: Understanding Object Lifetime, C On Sea, July 2, 2024 ► cpponsea.uk/2024/sessions/understanding-object-lifetime-for-efficient-and-safer-cpp.html Upcoming Workshop: C Best Practices, NDC TechTown, Sept 9-10, 2024 ► ndctechtown.com/workshops/c-best-practices/4ceb8f7cf86c Upcoming Workshop: Applied constexpr: The Power of Compile-Time Res...
C++ Weekly - Ep 424 - .reset vs →reset()
Просмотров 11 тыс.2 месяца назад
☟☟ Awesome T-Shirts! Sponsors! Books! ☟☟ Upcoming Workshop: Understanding Object Lifetime, C On Sea, July 2, 2024 ► cpponsea.uk/2024/sessions/understanding-object-lifetime-for-efficient-and-safer-cpp.html Upcoming Workshop: C Best Practices, NDC TechTown, Sept 9-10, 2024 ► ndctechtown.com/workshops/c-best-practices/4ceb8f7cf86c Upcoming Workshop: Applied constexpr: The Power of Compile-Time Res...
C++ Weekly - Ep 423 - Complete Guide to Attributes Through C++23
Просмотров 9 тыс.2 месяца назад
☟☟ Awesome T-Shirts! Sponsors! Books! ☟☟ Upcoming Workshop: Understanding Object Lifetime, C On Sea, July 2, 2024 ► cpponsea.uk/2024/sessions/understanding-object-lifetime-for-efficient-and-safer-cpp.html Upcoming Workshop: C Best Practices, NDC TechTown, Sept 9-10, 2024 ► ndctechtown.com/workshops/c-best-practices/4ceb8f7cf86c Upcoming Workshop: Applied constexpr: The Power of Compile-Time Res...
C++ Weekly - Ep 422 - Moving from C++20 to C++23
Просмотров 9 тыс.3 месяца назад
☟☟ Awesome T-Shirts! Sponsors! Books! ☟☟ Upcoming Workshop: Understanding Object Lifetime, C On Sea, July 2, 2024 ► cpponsea.uk/2024/sessions/understanding-object-lifetime-for-efficient-and-safer-cpp.html Upcoming Workshop: C Best Practices, NDC TechTown, Sept 9-10, 2024 ► ndctechtown.com/workshops/c-best-practices/4ceb8f7cf86c Upcoming Workshop: Applied constexpr: The Power of Compile-Time Res...
C++ Weekly - Ep 421 - You're Using optional, variant, pair, tuple, any, and expected Wrong!
Просмотров 27 тыс.3 месяца назад
C Weekly - Ep 421 - You're Using optional, variant, pair, tuple, any, and expected Wrong!
C++ Weekly - Ep 420 - Moving From C++17 to C++20 (More constexpr!)
Просмотров 8 тыс.3 месяца назад
C Weekly - Ep 420 - Moving From C 17 to C 20 (More constexpr!)
C++ Weekly - Ep 419 - The Important Parts of C++23
Просмотров 13 тыс.3 месяца назад
C Weekly - Ep 419 - The Important Parts of C 23
CS101++ - What Are Open Source and GitHub?
Просмотров 1,6 тыс.3 месяца назад
CS101 - What Are Open Source and GitHub?
CS101++ - What is`goto`?
Просмотров 1,6 тыс.3 месяца назад
CS101 - What is`goto`?
C++ Weekly - Ep 418 - Moving From C++14 to C++17
Просмотров 7 тыс.3 месяца назад
C Weekly - Ep 418 - Moving From C 14 to C 17
CS101++ - What is a `for` Loop?
Просмотров 1,2 тыс.4 месяца назад
CS101 - What is a `for` Loop?
CS101++ - What is a `do` Loop?
Просмотров 1,1 тыс.4 месяца назад
CS101 - What is a `do` Loop?
C++ Weekly - Ep 417 - Turbocharge Your Build With Mold?
Просмотров 6 тыс.4 месяца назад
C Weekly - Ep 417 - Turbocharge Your Build With Mold?
CS101++ - What Are `if`/`else` Statements?
Просмотров 1,3 тыс.4 месяца назад
CS101 - What Are `if`/`else` Statements?
CS101++ - What is a `while` Loop?
Просмотров 1,4 тыс.4 месяца назад
CS101 - What is a `while` Loop?
C++ Weekly - Ep 416 - Moving From C++11 to C++14
Просмотров 6 тыс.4 месяца назад
C Weekly - Ep 416 - Moving From C 11 to C 14
CS101++ - What Are Truth Tables?
Просмотров 1,8 тыс.4 месяца назад
CS101 - What Are Truth Tables?
CS101++ - What Are Computability and the Halting Problem?
Просмотров 2,2 тыс.4 месяца назад
CS101 - What Are Computability and the Halting Problem?
C++ Weekly - Ep 415 - Moving From C++98 to C++11
Просмотров 8 тыс.4 месяца назад
C Weekly - Ep 415 - Moving From C 98 to C 11

Комментарии

  • @mytech6779
    @mytech6779 5 часов назад

    OpenSyCL was also known as hipSyCL (for AMDs hip GPU framework) if it helps anyone trying to lookup information. And the switch of SyCL from version numbers like v2.3 to revision years (2020 ie) also marked a complete change in the entire standard, which was based largely on OpenCL and is now fully independent; the year revisions are also intended as a way to stay somewhat aligned with iso C++ revisions.

  • @anon_y_mousse
    @anon_y_mousse 8 часов назад

    That is most curious to me that using the same standard library implementation the two compilers produce such wildly different results. I would assume it has something to do with how the code is optimized, as in clang expects a particular organizational structure based on how they think optimization should work and gcc uses something totally different that doesn't mesh when clang is set to accommodate gcc's library. Although, I'd bet that when I see your video it'll be something completely different.

  • @geto6242
    @geto6242 10 часов назад

    New to the channel. This is an instant subcribe. Thanks!

  • @PopescuAlexandruCristian
    @PopescuAlexandruCristian 12 часов назад

    Imagine the lack of skill that you must have to use some garbage like this. 68 times compared to what a CPU does is pocket change for a GPU if you are not a Java prpgrammer

    • @Illuhad
      @Illuhad 12 часов назад

      Dude....this was done on an APU, not a powerful dedicated GPU. Memory bandwidth there is the same as on CPU. And the application is hardly a benchmark, there are a couple things in there that are not ideal could be optimized. It's a simple example...

    • @Spielix
      @Spielix 8 часов назад

      Using abstractions is not about a lack of skill, but about using your time as a developer efficiently. Once you have a working implementation you can start to benchmark/profile and optimize the actual bottlenecks instead of wasting time on reinventing the wheel.

  • @vasylzaichenko3253
    @vasylzaichenko3253 12 часов назад

    Just FYI, I have Asus Rog Flow X13 AMD CPU + Nvidia GPU and it is much easier to build Intel’s oneAPI llvm toolchain both for Windows and WSL2 just having CUDA toolkit installed. From my POV it is easiest way to start playing with DPC++

  • @Reneg973
    @Reneg973 19 часов назад

    Is there a way to sort that array at compile time to be able to use lower_bound for lookup?

  • @stevesimpson5994
    @stevesimpson5994 20 часов назад

    Can you mix RVO and NRVO?

  • @AusSkiller
    @AusSkiller 23 часа назад

    I wonder how the performance of this compares to just writing a fragment shader to do the computation. Honestly I'm pretty surprised by how slow it is compared to what I would normally expect on a GPU, I was expecting well under 10ms per iteration at 10,000x10,000, then again maybe it is limited by memory bandwidth especially on an integrated GPU. Also I tend to have pretty high end GPUs so my expectations are probably a little high for an integrated GPU.

    • @Illuhad
      @Illuhad 16 часов назад

      Yeah, an APU will have memory bandwidth of like 30GB/s, depending on the exact configuration... Also, keep in mind that these are not pure kernel timings, but host-side timings. There might be offloading latencies, initial data transfer costs etc included in those timings as well. The code seems also more optimized for teaching rather than perf, e.g. if I see it correctly, it does not generate the indices of the cells on the fly using e.g. iota view but stores them in memory, which is not needed. So it has to move more data than just the 10000x10000 grid and the associated stencil. AdaptiveCpp has been shown to deliver competitive perf on large HPC GPUs compared to other models like CUDA.

  • @greyfade
    @greyfade День назад

    FYI: Distrowatch is frequently botted and distros can pay to have their rankings artificially inflated. Don't trust it for rankings.

  • @catlolis
    @catlolis День назад

    the compiler should just do it!!!!!!!!!!!

  • @matrixstuff3512
    @matrixstuff3512 День назад

    Id love to hear your thoughts as a fresh user compaing this with kokkos

  • @bevanweiss522
    @bevanweiss522 День назад

    It would have been good to see the graph continue for a few more size iterations 'bigger' on the right. It hasn't clearly shown the intersection between GPU and CPU where it appears the GPU 'bowl' is on the way back up. Perhaps it was just going to level out around the Clang CPU curve (suggesting there is some high intensity CPU load associated with the larger grids.. perhaps virtual mem paging, which i suspect is not CPU parallelizable)

    • @Illuhad
      @Illuhad День назад

      Yep, the hardware used was an APU. APUs/iGPUs typically have fairly limited amounts of dedicated memory. So I suspect that for the larger problem sizes, we reach that limit and virtual memory shenanigans start. Apart from caching effects for small problem sizes, we usually don't see the pattern that AdaptiveCpp slows down for larger problems as long as you remain within VRAM capacity.

  • @toast_on_toast1270
    @toast_on_toast1270 День назад

    Will definitely be checking this out. In solving some parallelisable problems at my job, I went ahead and used Vulkan by essentially modifying the compute shader example on the Vulkan website, and saw something like a 100x speedup. However, it's pretty complicated to use. You have to write the shader code, compile it at runtime, manage dispatch, memory and synchronization. It's a pretty long way from standard c++. There's also a lot to go wrong. If Adaptive C++ can even come close to the performance I am getting with Vulkan it's worth a shot, because it will simplify the codebase significantly. I would like to see how well it handles complex tasks, say for example if it can chunk through a lot of trig operations on data quickly, and also how efficiently it handles the dispatch or successive "draw calls".

  • @Antagon666
    @Antagon666 День назад

    Lemme guess, Clang vectorizes the code, which is useless in this case

  • @dsecrieru
    @dsecrieru День назад

    Aren't there race conditions when parallel processing a cell's neighborhood?

    • @Antagon666
      @Antagon666 День назад

      Nope since you don't modify the input buffer, and outputs are unique 9

    • @toast_on_toast1270
      @toast_on_toast1270 День назад

      If the output of the cell at Tn depends only on its neighbours at Tn-1, then no: you can have multiple reads on the same data and not cause a race condition. If, on the other hand, the output of a given cell depends on the output of the other cells, then GPU programming is not the tool for you.

    • @Spielix
      @Spielix 8 часов назад

      @@toast_on_toast1270 "then GPU programming is not the tool for you." That is a bit of hyperbole. First of all this problem is not specific to GPUs and second of all there are solutions for parallel in-place updates like coloring (See Red-black Gauss-Seidel).

  • @kikeekik
    @kikeekik День назад

    AFAIK, acpp uses OpenMP or OpenCL as CPU backends, not TBB

    • @victotronics
      @victotronics День назад

      It has backends for OpenMP, CUDA, HIP

    • @Illuhad
      @Illuhad День назад

      This is true. However, the parallel STL implementations in libstdc++ and libc++ rely on TBB. I think what was done here was to compare with acpp as a regular host compiler using PSTL from libstdc++ without offloading. This then would go through TBB due to libstdc++ internals. You can also use AdaptiveCpp to run PSTL on CPU via AdaptiveCpp's CPU support (OpenMP or OpenCL as you say), but this was not the focus here I think.

  • @SillyOrb
    @SillyOrb День назад

    8:47 Just a minor nitpick: twice as fast isn’t the same as twice faster. With that out of the way, that’s curious. It would make for a good follow up.

  • @darkmagic543
    @darkmagic543 День назад

    Not bad, although seems usable only in very specific scenarios? In the case you want to juice out maximum performance, you would just use something like CUDA to get better performance as it gives you more control. If your task is a bit more complex, such that using simple standard algorithms would be hacky, it would also not be great solution - what about concurrency/synchronization? So basically you would probably use it only if you have a simple problem and you want to quickly speed it up a little bit, but don't want to put the work in to speed it up more?

    • @VFPn96kQT
      @VFPn96kQT День назад

      CUDA works on Nvidia GPUs only. Sycl is generic and compiles to CUDA, Rocm, Spir-V or OpenMP.

    • @Illuhad
      @Illuhad День назад

      AdaptiveCpp also supports the SYCL programming model which you can mix-and-match with standard C++ algorithms. SYCL exposes much more control, on a similar level as CUDA. For example, you could e.g. start developing your application with C++ standard algorithms, and if you find some performance bottlenecks, optimize those bits in SYCL.

  • @callmeray7705
    @callmeray7705 День назад

    I remember reading on a vulkan (blog? might not have been a blog, idk, was a while ago) thing that the sweet spot was around 2 million concurrent floating point operations so glad to see a proven sweet spot with a similar number.

    • @mytech6779
      @mytech6779 6 часов назад

      With which hardware?

  • @avramlevitter6150
    @avramlevitter6150 День назад

    I'm curious what the performance is when running a natively-written CUDA version of this code, and how it stacks up against the AdaptiveCPP versions. In general, I find that these "write once, compile for anywhere" systems tend to only be useful if it's actually your use case to not target a specific architecture, but once you're already in a situation where you know you're going to be running on a particular architecture it's almost always better to write something natively for it. I know that AdaptiveCPP's claim is that it can even beat the native code but that's something I'd like to see benchmarks on.

    • @kikeekik
      @kikeekik День назад

      there are benchmarks, look in google scholar. The benchmarks my team did were ~20% slower than cuda in comparison to SYCL DPC++, but that was 3 years ago, things have changed a lot in the last years

  • @mjKlaim
    @mjKlaim День назад

    Wow I wasnt aware of that hetergeneous compiler! I'll note to play with it someday, maybe mixnig with the new std::execution library }:D

  • @gast128
    @gast128 День назад

    GPU's are cool though be aware of applicability and the memory transfer overhead. Microsoft used to offer C++ AMP which was a nice library to offload calculations to an accelerator. Unfortunately they withdraw that library.

  • @markusasennoptchevich2037
    @markusasennoptchevich2037 День назад

    There is a reason system programmers don't like c++ for low-level stuff

    • @sqlexp
      @sqlexp 4 часа назад

      Skill issues.

  • @TsvetanDimitrov1976
    @TsvetanDimitrov1976 День назад

    The fact that this is pure c++ code is actually quite impressive. Reminds me of c++ AMP back in 2011. Still, this kind of solutions leave a ton of performance on the table compared to hand writing it using vulcan or dx12 compute shaders, so I'm not really sure it's the right way to go forward towards heterogenous computing. I'd rather have the GPU vendors conform to a common ISA, so that we can directly program the GPUs instead of going through multiple layers of (black box)abstractions.

    • @victotronics
      @victotronics День назад

      Given that NVidia dominates the market, they are not interested in a common ISA. But Sycl & Kokkos are such common ways of writing for multiple GPU brands.

    • @Illuhad
      @Illuhad День назад

      AdaptiveCpp's C++ standard parallelism offloading is not intended for folks who are willing to hand-write shader code. It's for people who have a C++ application, and want to remain at a high abstraction level, and perhaps get some speedup just by recompiling. If you want more control, AdaptiveCpp also supports SYCL as a programming model which exposes much more control to the users. And you can mix both models in the same app, e.g. start on a high level, then move to SYCL if you want to optimize some kernel in particular. A common ISA for GPUs is... extremely unrealistic. Architectures are way too different. And vendors can't even agree on a common IR. AdaptiveCpp by the way supports a unified IR and code representation across all its targets (CPU as well as GPUs).

    • @TsvetanDimitrov1976
      @TsvetanDimitrov1976 15 часов назад

      ​@@Illuhad I totally agree, it's a great tool. My comment was more along the lines of a possible way for c++ to go into gpu programming, while still be as close to the metal as possible. And for that to be possible we definitely need at least a stable ISA from each vendor even if it's not common between nvidia/amd/intel/etc. I don't mind writing all the shaders and the surrounding infrastructure code, but I imagine a future where I could just write pure c++ code instead of hlsl, glsl, metal, etc. and leave all that work to the compiler without giving up control or performance.

    • @Illuhad
      @Illuhad 15 часов назад

      @@TsvetanDimitrov1976 But you can do this with AdaptiveCpp though. It has unified code representation based on LLVM IR, which is then JIT-compiled at runtime for GPUs from all the vendors. And while what you have seen in this video was fairly high-level, AdaptiveCpp also allows you to have way more low-level control if you like. The SYCL model that it supports is on a similar abstraction level as CUDA, so it might be pretty close to what you want...Aligning ISA would require aligning hardware architectures. When you say ISA, I'm not sure you really mean ISA. For example NVIDIA does not even have a well-documented stable ISA. Their ISA (SASS) is proprietary and changes with each GPU version. What NVIDIA has is an intermediate representation (IR) for all their GPUs called PTX. AdaptiveCpp gives you an intermediate representation across all the GPUs.

    • @TsvetanDimitrov1976
      @TsvetanDimitrov1976 14 часов назад

      @@Illuhad "For example NVIDIA does not even have a well-documented stable ISA. Their ISA (SASS) is proprietary and changes with each GPU version." That's exactly what I hate about it. I'd rather program the hardware than the OS/driver/whatever abstraction on top of the driver. This is the loss of control/performance I am talking about. I'm a game dev so it's probably a very niche opinion, but I want total control over the memory allocation, scheduling and executing the code on the gpu. I kind of get it through vulcan/dx12/etc. but that's at least 2 levels of indirection which I'd rather not have. ANY "magic" runtime is a no starter, be it a driver, an API, or some state machine that assumes how I want to use the gpu. I hope that clears my stance.

  • @literallynull
    @literallynull День назад

    Hey, Jason, what do you think of the Intel's DPC++?

    • @victotronics
      @victotronics День назад

      That's basically the same as SYCL. But without this ultra-cool trick of converting range algorithms.

    • @Spielix
      @Spielix 8 часов назад

      @@victotronics In turn Intel have their own version of parallel STL-like algorithms called oneDPL. It's basically Intel's answer to Thrust/rocThrust. Being able to just use the STL algorithms is pretty cool, but in many situations these libraries bring some extra features to the table as well like segmented reductions, scans and sorts.

  • @mathieu564
    @mathieu564 День назад

    I haven't yet seen this video, but that would be great if this kind of video were done more in C++ Weekly. It is the kind of video, where even if done badly it would be great because the subject is not well covered. How to use C++ to use X and Y would really be great for viewers.

  • @fredhair
    @fredhair День назад

    With regards Linux distros I'd recommend EndeavorOS which is based on Arch and comes in multiple flavours (each with a different desktop & software) but by default I think it's very up to date KDE Plasma. The ISO comes with a graphical installer and the distro itself seems very stable despite being quite bleeding edge. Pacman is a good wrapper around the native arch package manager and if you really want to have some GUI software installer you can use 'Discover' - though it is recommended to use the terminal.

  • @AxWarhawk
    @AxWarhawk День назад

    Wait until he learns about Celerity 😉

  • @BC_Geoff
    @BC_Geoff День назад

    Just get a compass card, it's actually cheaper than just using your CC. And you get a nice souvenir to take home with you.

  • @victotronics
    @victotronics 2 дня назад

    AdaptiveCpp is indeed an open implementation of Sycl. Unfortunately Sycl is not an easy system to program in. There is also Kokkos (from Sandia Labs) that also targets multiple backends, and is (imnsho) easier to program. And then there is OpenMP with offloading. With the exception of OpenMP, they are all "data parallel" systems, much like CUDA. In fact, if you squint a little, they all look so similar to CUDA that you can probably automatically convert them. In fact, Intel has such a tool for their version of Sycl. But that's the story for the standard Sycl. The fact that you didn't have to change your code means that (and this I didn't know and is very cool!) apparently AdaptiveCPP translates CPP range algorithms to the (ugly ugly) underlying Sycl code. I think this is specific to the ACPP compiler, and it's not behavior mandated by the Sycl standard. Cool.

    • @Spielix
      @Spielix День назад

      "not easy to program in" is relative. As someone used to CUDA, SYCL to me actually looks quite nice to program in. I think it also compares quite favorably to OpenCL. On the other hand I have little experience with actually using SYCL (or OpenCL) outside of reading some samples, so take these opinions with a grain of salt. Nvidia nowadays provides a HPC toolkit including the nvc++ compiler (formerly PGI's HPC C++ compiler) that can also offload stdpar algortihms (only for Nvidia hardware). It basically just interfaces to Nvidia's Thrust library and replaces heap memory with CUDA UVM, i.e. memory that is accessible from both CPU and GPU. Allocations become quite expensive though.

    • @victotronics
      @victotronics День назад

      @@Spielix I've never written OpenCL but it looks awful. Sycl & Cuda are both "apply this point function over this range". However, Sycl, unlike Kokkos (or Cuda) insists on making the task queue explicit which complicates writing the kernels needlessly.

    • @Spielix
      @Spielix 8 часов назад

      @@victotronics I guess you rather mean writing the host-side orchestration of kernels, buffers and so on? Because it doesn't seem to influence how one writes the kernel functions themselves. And with SYCL 2020 USM and terse syntax it seems to me as if they fixed the worst boilerplate? I just looked at Intels small SYCL tutorial which mentions those 2020 features in the end. While CUDA streams are somewhat optional you still want to use them in serious code for asynchronicity/overlapping multiple operations, so generally I don't see a problem with explicit queues other than the amount of boilerplate with the classic syntax and buffers.

    • @mytech6779
      @mytech6779 5 часов назад

      ​@@Spielix Old SyCL was basically a fancyfied openCL frontend. With the change from version numbers to year releases the entire specification changed so SyCL is now a stand alone standard no longer sitting on the openCL backend. The year releases are also intended as a way to stay more clearly aligned with the Iso C++ revision cycle.

  • @paulluckner411
    @paulluckner411 3 дня назад

    Have you tried to constrexpr it?

  • @Silmarieni1
    @Silmarieni1 3 дня назад

    @cppweekly you are required by RUclips to select the box "My video contains paid promotion" when it's the case, such as here with the CLion promotion.

    • @cppweekly
      @cppweekly 3 дня назад

      I thought I had. I went back and double checked my videos a few days ago. Sorry about that!

    • @cppweekly
      @cppweekly 3 дня назад

      Yes I just double checked, the box is definitely checked on this video, and was already before I looked at it.

  • @sirhenrystalwart8303
    @sirhenrystalwart8303 4 дня назад

    I'm way more sold on the benefit of constexpr/consteval after seeing this.

    • @cppweekly
      @cppweekly 3 дня назад

      Yeah, I'm still refining the way I present this stuff. This is a really good argument for it!

  • @pancio-ciancio
    @pancio-ciancio 6 дней назад

    It's out of scope of the video but I don't where to ask anymore. CLion reports a few unreachable code when using constexpr in my math library. I copy paste it part of it into Compiler Explorer, make sure I use the same toolchain, enable all warnings and it doesn't report anything. So, am I doing something wrong? I cannot find any CLion forum that solve this problem

  • @stefanmilenkovic4715
    @stefanmilenkovic4715 6 дней назад

    Why would you use polymorphism?

  • @anon_y_mousse
    @anon_y_mousse 6 дней назад

    It is kind of annoying if the compiler can't correctly optimize away "temporaries" though. In my compiler I didn't even leave that kind of logic until the optimization stage and instead added it in as a basic part of compilation.

  • @VictorYarema
    @VictorYarema 7 дней назад

    Awesome episode. I don't regret watching it. Now I can better remember about this specific pitfall. The worst emotion now is a frustration caused by so called "standard" which was supposed to make everyone's life easier instead of making really bad things harder to spot.

  • @jamesburgess9101
    @jamesburgess9101 7 дней назад

    maybe C++ is getting simpler after all!

  • @richardblain4783
    @richardblain4783 8 дней назад

    6:44 Why would you ever need to rely on NRVO, other than for efficiency? The only situation I can think of is if you’re trying to return a non-copyable, non-movable object from a function. But in that case, the compiler could not return the object and you’d get an error diagnostic, with or without a -Wnrvo compiler option.

    • @garyesser1007
      @garyesser1007 7 дней назад

      exactly as you said, for efficiency

  • @KhalilEstell
    @KhalilEstell 8 дней назад

    This is super neat Great video Jason!

  • @zergeyn
    @zergeyn 8 дней назад

    Feels like C++ getting worse and worse. Now it's not ok to move objects out of a functions Yay :(

    • @simonmaracine4721
      @simonmaracine4721 7 дней назад

      It has never been okay to move objects out of functions. C++ is not getting worse, but quite the opposite.

  • @X_Baron
    @X_Baron 8 дней назад

    If you actually don't want elision, you can disable the warning by wrapping the return value in std::move (when the returned type has a suitable constructor). The intent may not be obvious to the reader, though...

  • @piotrbatko172
    @piotrbatko172 8 дней назад

    Woudn't be cool to have [[rvo]] attribute for functions? It could report error if compiler is not able to do it.

    • @keris3920
      @keris3920 8 дней назад

      I have had a lot of discussions about this exact idea lately. I want one for tail calls as well

  • @cubeman5303
    @cubeman5303 8 дней назад

    Great addition, tho I feel like as attribute it wouldn't get in the way as much as when it is global warning.

  • @PaulMetalhero
    @PaulMetalhero 8 дней назад

    "MSVC compilers down" ... who cares?

    • @keris3920
      @keris3920 8 дней назад

      I care

    • @MattGodbolt
      @MattGodbolt 6 дней назад

      They're back now, either way

    • @keris3920
      @keris3920 6 дней назад

      @@MattGodbolt Thank you for the amazing tool!

    • @MattGodbolt
      @MattGodbolt 6 дней назад

      @@keris3920 You're welcome, honestly the main thing I do these days is admin and devops; I have a team of volunteers who do all the actual work :)

  • @literallynull
    @literallynull 8 дней назад

    I still prefer MinGW and Clang over MSVC for a reason...

    • @TrueWodzu
      @TrueWodzu 4 дня назад

      How this is relevant to the video? He is using GCC :)

  • @kimhyunpil
    @kimhyunpil 8 дней назад

    Thank you 👍👍

  • @Raspredval1337
    @Raspredval1337 8 дней назад

    this would've been useful to me like a week before. Had to pull one of your older videos about std::optional\std::variant and RVO to get it 'right'

  • @austinbachurski7906
    @austinbachurski7906 8 дней назад

    Awesome!

  • @nmmm2000
    @nmmm2000 8 дней назад

    is this included in -Wall or -Wpedantic ?