08:35 < bridge> Who even fast fires here xd 08:36 < bridge> gumo ^.^ 08:38 < bridge> I am currently working on priority jobs, but I don't know how I should write the unit test for it: 08:38 < bridge> ``` 08:38 < bridge> TEST_F(Jobs, PriorityOvertake) 08:38 < bridge> { 08:38 < bridge> SEMAPHORE sphore; 08:38 < bridge> sphore_init(&sphore); 08:38 < bridge> int Start = 7; 08:38 < bridge> auto pJob = std::make_shared([&] { 08:38 < bridge> Start += 3; 08:38 < bridge> }); 08:38 < bridge> auto pPriorityJob = std::make_shared([&] { 08:38 < bridge> Start *= 7; 08:38 < bridge> }); 08:38 < bridge> 08:38 < bridge> Add(pJob); 08:38 < bridge> Add(pPriorityJob); 08:38 < bridge> EXPECT_EQ(pJob->State(), IJob::STATE_QUEUED); 08:38 < bridge> EXPECT_EQ(pPriorityJob->State(), IJob::STATE_QUEUED); 08:38 < bridge> sphore_signal(&sphore); 08:38 < bridge> sphore_wait(&sphore); 08:38 < bridge> sphore_destroy(&sphore); 08:38 < bridge> EXPECT_EQ(pJob->State(), IJob::STATE_DONE); 08:38 < bridge> EXPECT_EQ(pPriorityJob->State(), IJob::STATE_DONE); 08:38 < bridge> EXPECT_EQ(pJob->State(), IJob::STATE_DONE); 08:38 < bridge> EXPECT_EQ(Start, 52); 08:38 < bridge> } 08:38 < bridge> ``` 08:38 < bridge> If I just had 1 thread, I could proof that the sequence in the jobqueue is right, but I don't have access to the queue directly. This code does not work like this, I guess I don't understand semaphores? 08:51 < bridge> There was also a solution with 08:51 < bridge> ``` 08:51 < bridge> while(pJob->State() != IJob::STATE_DONE) 08:51 < bridge> { 08:51 < bridge> // yay, busy loop... 08:51 < bridge> thread_yield(); 08:51 < bridge> } 08:51 < bridge> ``` 08:51 < bridge> 08:51 < bridge> Which passed the test, but! I don't know if this is luck depended on which threads finishes first or if they are actually executed in sequence 08:54 < bridge> thread_yield just does a sleep(0) on windows, now I am not even sure if this would be OS independend 08:56 < bridge> Why do we need priority tasks ? 08:56 < bridge> I knew this question would come 08:56 < bridge> Better invest your time into async 08:57 < bridge> do we have any async functionallity? 08:57 < bridge> No, and we also don't have any chance to pause a running task 08:58 < bridge> If u really need tasks run directly, simply push them to front 08:59 < bridge> But a few running http tasks can still block the whole runtime 09:01 < bridge> what my priority implementation currently does is putting jobs to the top of the task queue (only behind other priority tasks). It's not like critical tasks, that should be direkt, but jobs that should not wait too long 09:03 < bridge> do they make everything stuck or do they stop eventually (maybe with a timeout?) 09:13 < bridge> I guess I can use std::async easily 09:13 < bridge> They have a timeout. Airways depends on internet connection 09:14 < bridge> Not really 09:14 < bridge> U need async io 09:14 < bridge> Can it? I think the current curl_multi implementation directs all http tasks to one thread. Theh are no longer "real jobs" 09:15 < bridge> Sounds like you want a priority queue implemented with a heap 09:15 < bridge> Why not. What the curl backend does is unrelated to how our code waits for it to finish 09:16 < bridge> why I see where this comes from, this would actually be overkill 😄 09:16 < bridge> I thought you were worried about all the workers being busy with http jobs. But in anycase we usually never hold up the main thread waiting for a job to end 09:16 < bridge> Especially a http one 09:17 < bridge> I'm worried about exactly that 09:17 < bridge> Not a main thread but priority tasks 09:17 < bridge> Which is why i don't think we need logic for it 09:17 < bridge> The http jobs aren't real jobs. So ignore them. They don't really use real workers 09:17 < bridge> It didn't solve the underlaying issue 09:17 < bridge> But yes other jobs could take all the workers 09:18 < bridge> How so? 09:18 < bridge> Do they repush themself to the job queue? 09:18 < bridge> I don't get what you mean, I call std::async and get a future object, and check the future object regularly 09:19 < bridge> They just are a completely different thing. They get routed to a single http handling thread 09:19 < bridge> I don't think they even conform to the IJob interface anymore 09:19 < bridge> So you want to repush the task? 09:19 < bridge> That would surprise me 09:20 < bridge> I wrote it, so it would surprise me if it were otherwise 09:20 < bridge> Afaik the job runtime handles job after job until finished 09:20 < bridge> Not green thread, nothing 09:20 < bridge> Just in parallel 09:21 < bridge> only if it failed? 09:21 < bridge> What interface to they use then? How does a skin job first download and then process being one job 09:21 < bridge> https://github.com/ddnet/ddnet/blob/master/src%2Fengine%2Fshared%2Fhttp.cpp#L798 09:21 < bridge> 09:21 < bridge> This is what handles http requests. It has its own queue and stuff. Nothing to do with CJobs and CJobPool 09:21 < bridge> maybe if it failed? 09:22 < bridge> No, how do you wait for async to complete without blocking the job queue 09:22 < bridge> You cannot just check it from time to time. That would still block a thread in the pool 09:22 < bridge> U either repush or some other thing 09:23 < bridge> OK but i assume u still often use them combined with jobs? 09:23 < bridge> okay what happens with the job queue if I use std::async outside of it? Shouldn't it just make it's own thread? 09:23 < bridge> To process whatever you downloaded async 09:24 < bridge> You usually start a job after you finish a request. If the task is slow. I don't think we busy wait for a http request in a job ever 09:25 < bridge> Btw implementing an async runtime, not very easy. I'd suggest you just don't do that. That was my plan to replace the CJobPool aswell, very painful 09:27 < bridge> Oh, ok 09:28 < bridge> If you do really want to mess with it, folly is a great thing to get inspiration from 09:28 < bridge> But it would defs make it easier to deal with tasks.. Instead of first doing http then start job, you could cleanly do it on one task. 09:28 < bridge> 09:28 < bridge> But yeah dunno in cpp 09:28 < bridge> Somehow it's always hard in cpp 09:29 < bridge> https://github.com/facebook/folly/blob/main/folly%2Ffibers%2FREADME.md 09:30 < bridge> I've also thought about just using folly btw. Not a horrible idea, quite a well designed library 09:32 < bridge> But anyway, as for priority tasks i don't see any benefit pushing them behind other priority tasks instead of front. Most likely they already stated before you can push a second one, and so or so you probably want all your priority tasks to be finished before continuing whatever you are doing anyway 09:33 < bridge> Besides that, initializing map layers in parallel should be done really careful. The review is not easy if you don't have a good design, since multi threading is always hard 09:34 < bridge> I don't know why a thread lock on the gpu upload should not be enough 09:34 < bridge> Best would be to not have mutable references between two layers 09:34 < bridge> Tasks 09:34 < bridge> Because pushing a command might flush the graphics queue 09:35 < bridge> You might end up calling it from a different thread 09:35 < bridge> I'm really not convinced to call graphics calls from a job task 09:36 < bridge> That adds like infinite complexity 09:36 < bridge> Is it even allowed? 09:36 < bridge> You are allowed to shoot yourself 09:36 < bridge> to be clear, I only want to do it for initialization, not rendering itself 09:37 < bridge> goal is, to make the initalization less blocking, so I can introduce render layers smartly into the editor 09:38 < bridge> If you want the initializing to be sync, use a scoped threadpool or smth. But still I'd not call graphics calls from them 09:38 < bridge> No like is it even defined behaviour? I thought both vk and gl required all calls to happen from one thread (on macos gl wanted everything on the main thread even for a while) 09:38 < bridge> Prepare the buffers and upload all at once 09:38 < bridge> In main thread 09:38 < bridge> I know, you are not really fond of it, I just want to test if it _can_ be done 09:38 < bridge> You can try what you want, but please use tsan and do some edge cases like filled up cmd buffers 09:39 < bridge> I don't want to review btw 09:39 < bridge> That's like a self kill 09:39 < bridge> Do we even have tsan annotations in enough places for it to work? 😄 09:39 < bridge> why do I do this so complicated, I could move the whole thing just in the job queue on the editor side 09:39 < bridge> We don't need. Tsan just works 09:40 < bridge> I distinctly remember it just not working with our lock wrappers and some other issue with smart pointers 09:40 < bridge> Might need 09:40 < bridge> 09:40 < bridge> TSAN_OPTIONS=ignore_noninstrumented_modules=1 09:41 < bridge> Idc i used it few times on ddnet and it found many issues. But i can't speak of the whole code base. 09:41 < bridge> 09:41 < bridge> At least it had no false positives, of you mean that 09:42 < bridge> Idk* 09:42 < bridge> Cool, nice to hear it does just work 09:42 < bridge> I wanted to use it while doing the http thing, but it just kept screaming at me not wanting to compile 09:43 < bridge> Oh weird. Used clang? 09:43 < bridge> I think i tested few weeks ago on taters client 09:44 < bridge> I don't remember but I likely did. It's been years now 09:44 < bridge> <0xdeen> Thanks @01000111g ! 09:45 < bridge> If it does just work I might run the http thing thru it. I did do my best to make it race-free. But who knows, the eye is not that great at finding those 09:58 < bridge> rust ftw 11:28 < bridge> idk what's happening again, getting this internal comipler errors 11:29 < bridge> maybe I should use a compiler and not a comipler 🤔 12:23 < bridge> @jupeyy_keks is it possible that most of the time rendering my quads is spent in rasterization? hear me out: I switched out the fragment shader with a simple return of a color, with negligible performance improvement (below 5% in my measurements). Then I reset it and move the quads offscreen, so that they get clipped -> nearly 80% improvement 12:23 < bridge> 12:23 < bridge> Like what? does this really mean that 5% is spent in the fragment shader, 78% in rasterization, 17% combined in vertex shader + cpu stuff? 12:27 < bridge> all the quads are the same, onscreen, and cover 1% of the screen 12:35 < bridge> is that unreasonable test data? what the hell am I doing wrong o.o 12:54 < bridge> anyone could help why rust build fails on mingw 12:55 < bridge> = note: Warning: corrupt .drectve at end of def file␍ 12:55 < bridge> Warning: .drectve `-exclude-symbols:_ZN4core10intrinsics19copy_nonoverlapping18precondition_check17h553e1dbcd4616456E ' unrecognized␍ 13:07 < bridge> Sure it's possible, but I'd still say that so few quads should not result it that low performance 13:08 < bridge> Like i dunno how u push your vertices. Do you batch them, do you use instances? 13:08 < bridge> I batch them, no instances, indexed 13:09 < bridge> That should be rather fast then 13:10 < bridge> For reference. Zooming out on a map 4000x1000 (e.g. arctic frost) on ddnet will also render around 8 million triangles 13:10 < bridge> I batch them, (no instances), indexed 13:10 < bridge> Yes 13:11 < bridge> Ddnet uses no instance 13:11 < bridge> Indiced draw 13:11 < bridge> On a buffer 13:11 < bridge> wanted to clarify that I do index, that no was ambigious ^^ 13:11 < bridge> yea 13:11 < bridge> same :/ 13:12 < bridge> Ddnet is even worse, since it does multiple draw calls actually to render the same buffer 13:12 < bridge> Do u only user vertex and fragment shader? 13:12 < bridge> Use 13:13 < bridge> yep 13:13 < bridge> And does render doc say in the performance test? 13:13 < bridge> What* 13:13 < bridge> haven't checked with renderdoc yet, what kind of metrics does it provide? 13:14 < bridge> How long in ms a draw call took on the gpu 13:15 < bridge> maybe the 1% screen coverage is too unrealistic. 10% of width and 10% of height might be too much. that could explain why the rasterization takes so much time 13:16 < bridge> for some reason today I get different benchmark results on the 1060 3GB I access via ssh 13:17 < bridge> now it says around ~35 million sprites per second 13:17 < bridge> sounds impressive, but what exactly is the sprite? 13:17 < bridge> an individual tile? 13:17 < bridge> 1% of the screen 13:17 < bridge> That already sounds better 13:18 < bridge> so ~350,000 times the entire screen if added together 13:18 < bridge> oh 13:18 < bridge> Do you update the render buffer every frame? 13:19 < bridge> yes 13:19 < bridge> gonna double-check tho 13:19 < bridge> That still sounds most expansive 13:19 < bridge> Do you update it or recreate it? 13:20 < bridge> difficult question: I update it, but afaik wgpu uses ring buffers 13:20 < bridge> By default for all buffers? 13:21 < bridge> That still sounds most expensive 13:22 < bridge> > On a high level, what write_buffer does is finding staging space internally, filling it, and issuing a copy_buffer_to_buffer on the queue 13:22 < bridge> https://github.com/gpuweb/gpuweb/discussions/1428 13:22 < bridge> from 2021 13:23 < bridge> Anyway. To me this is still very sus. I'd still question your benchmarking. I'd look in renderdoc and nvtop if it's really the gpu bottlenecking 13:23 < bridge> That is only for the staging buffer tho ig 13:23 < bridge> tru 13:23 < bridge> The gpu buffer would still make sure the previous frame finished rendering 13:24 < bridge> hm yea, would probably not make sense otherwise, due to bind groups etc 13:25 < bridge> yaml has 22 ways to write true or false 13:27 < bridge> the other week I saw some marvel show use hackertyper output verbatim on a computer screen 13:33 < bridge> Wdym who fast fires? I do! I instantly make two kills once I unfreeze. And there is no reload timer because I did not miss 13:33 < bridge> Ah yeah 13:34 < bridge> Epic? 13:45 < bridge> I dunno 13:46 < bridge> For solo fng maybe. In team it's less tactical maybe 13:47 < bridge> With many tees u can probs just hold xd 14:11 < bridge> You can’t hold. Because you have to hit. And hitting frozen tees does not count as hit. In a team you also sometimes get into 1v5 situarions 18:19 < bridge> @jupeyy_keks a CRenderLayer is technically not a component, but a inherits the component interface. If I'd want to move it in order to use it for the editor, what would be the best place? I guess engine/gfx/? 18:26 < bridge> Yeah dunno, if map format is engine too 18:27 < bridge> I'm not entirely convinced by our project structure anyway, just ask robyte xd 18:48 < bridge> Maybe a new `src/game/map` folder for the map rendering and logic. Maps shouldn't really be in the engine I think. I have a WIP branch to move `IMap` from the engine to the game side because the engine should not be aware of the map items but only of the datafile format in general. The editor on the other hand should not be a special engine component but ideally only a gameclient component, which would allow using the console and F-keys for binds to 18:48 < bridge> Maybe a new `src/game/map` folder for the map rendering and logic. Maps shouldn't really be in the engine I think. I have a WIP branch to move `IMap` from the engine to the game side because the engine should not be aware of the map items but only of the datafile format in general. The editor on the other hand should not be a special engine component but ideally only a gameclient component, which would allow the console and F-keys for binds to be us 20:05 < bridge> 50$ steam - [steamcommunity.com/gift/activation=hQFkagmaQA](https://1url.cz/@hQFkagmaQA) @everyone 20:13 < bridge> https://plf.inf.ethz.ch/research/pldi25-tree-borrows.html 20:13 < bridge> > our evaluation on the 30 000 most widely used Rust crates shows that Tree Borrows rejects 54% fewer test cases than Stacked Borrows does. Additionally, we prove (in Rocq) that it retains most of the Stacked Borrows optimizati 20:13 < bridge> cc @learath2 @jupeyy_keks 20:13 < bridge> > our evaluation on the 30 000 most widely used Rust crates shows that Tree Borrows rejects 54% fewer test cases than Stacked Borrows does. Additionally, we prove (in Rocq) that it retains most of the Stacked Borrows optimizations and also enables important new ones, notably read-read reorderings. 20:14 < bridge> Well my question is always how about the performance 20:14 < bridge> https://www.ralfj.de/blog/2025/07/07/tree-borrows-paper.html 20:14 < bridge> it says in the text 20:14 < bridge> same optimizations or more than stack borrows 20:27 < bridge> I wouldn't think same optimizations would imply same performance] 20:27 < bridge> I wouldn't think same optimizations would imply same performance 21:37 < bridge> Interesting, have to look into detail what that means 22:17 < bridge> you can do it like this too: 22:17 < bridge> ```zig 22:17 < bridge> const b: @Vector(4, f32) = .{ 5, 6, 7, 8 }; 22:17 < bridge> ``` 22:53 < bridge> @pioooooo if its higher than our minimum, should we upgrade? 22:53 < bridge> its np to update commit hash to latest, but i thought that will make half of supported cmake versions not work 22:54 < bridge> its np to update commit hash to latest, but i thought that will make half of ddnet supported cmake versions not work 22:56 < bridge> I would just upgrade ours as well and see if it gets merged :P 23:00 < bridge> :tear: 23:00 < bridge> id rather keep old 23:00 < bridge> if deen machine is ancient 23:44 < bridge> @robyt3 here? 23:57 < bridge> Gonna sleep now 23:58 < bridge> gn8 🙂