The problem with the Xoro's (and similar) is that the main CPU/chipset only has hardware support for some older video codecs.
A codec is a routine that takes care of decoding a digital data stream, in this case the audio and video streams.
The codecs that are not supported by the hardware must be handled in software and this is a heavy task which the cpu can not do in real time for many of the newer video codecs.
We are dealing with a CPU from Ingenic, the JZ4755 @ 400mhz and JS4760 @ 600mhz (depending on the model).
The CPU is decoding the data and sending it to the screen at a slower rate than what is needed to get fluent video.