The timeout is occuring because the device gets the wrong information from the host during the transfer and stops the operation, since the device doesn't reply (because the command was stopped) the USB operation on the host times out. The device isn't really 'detached' from the USB, but the library considers it to be so you have to unplug and replug the device to get everything back into a known state. That was why I designed it like that; so it has a clear 'stop' point if anything goes wrong.
The fact that turning off the debug makes it work is interesting... Presumably the debug increases the load and makes the operations slower; but it shouldn't make it so that it is any where near as long as 3 seconds.
There is a possiblity that the flood of debug requests backing up whilst the bulk transfer is performed is enough make the next command fail (a buffer overflowing somewhere perhaps), but I have never seen this occur when testing. The library uses a blocking read and write call, so it shouldn't be returning unless the lower-level windows functions have completed.
Out of curiousity how modern is your PC? Perhaps a slower PC would reproduce the issue. If you have a low spec PC it might be worth trying the tests on another box.
It's a shame you don't have a ready-made board so we could know for sure that the software is at fault.
I have to admit I'm running low on ideas... without something I can reproduce it's hard to think of what to do to track the issue

On a side note I'm starting to believe that the debug mechanism was not a great idea anyway; it would have been better to make the device enumerate as both a USB HID and a CDC (serial port) and run the debug over the CDC since then the debug could be collected without the need to constantly poll the device (I could also make including the CDC optional when compiling the device firmware). Perhaps over the summer (when I have some time away from 'real' work) I will implement it
