Fast PDF Creation Using Memory Streams
Memory streams are the equivalent of file streams, but instead of storing memory streams in files on a device, they are kept in memory, which is much faster. This is very useful for server applications, which create for example PDF documents on the fly, and send them via HTTP to client browsers. Since such document files are only temporary, there is no requirement to store them permanently on disk. In such case it is much more efficient to create documents in memory, send them to the client and free the memory afterwards.
In this article we will discuss a highly performant way of implementing memory streams, which we call "Chunked Memory Streams". Our reporting engine and PDF library Virtual Print Engine provides this type of memory streams.
Implementation of fast PDF Memory Streams
Basically, a Memory Stream is implemented as a class, which provides methods like Read(), Write(), Seek(), etc. When an object of this class is instantiated, it allocates internally a buffer of a given block size – say 16 KB. The write-method writes into this buffer. If the buffer size is exceeded, the memory is reallocated. Reallocation means that a new buffer is allocated, which has the old size plus one additional block size, i.e. the buffer grows from 16 KB to 32 KB and later on to 48 KB, etc. The crux is that the contents of the old buffer need to be copied to the new buffer (and afterwards the old buffer is deleted). This does cost a lot of performance, and it does cost the more performance as the buffer grows.
The solution to overcome this problem is simple, but requires some additional coding: instead of reallocating the whole buffer, the buffer is organized in chunks of a given block size. Pointers to those chunks are stored in a vector.
So we have a class StreamChunk, which represents a single block of data within a memory stream. The class MemoryStream has a member m_arChunks, which is a vector with pointers to StreamChunk-objects. Initially, the vector contains only one pointer to a single StreamChunk. The write-method writes to this chunk. If the size of the chunk is exceeded, a new chunk is created and its pointer is added to the vector. Further writes are redirected to the new chunk. In detail the implementation is a bit more complex, and is therefore more error-prone than the simple solution that uses reallocations. Because of that the implementation requires thoroughly unit testing, but it pays off when it comes to optimal performance.
Chunked PDF Memory Streams have two major advantages
First of all, no memory is copied, where copy-operations decrease the overall performance exponentially as the stream size increases.
Second, each reallocation requires during the copy-operation temporarily twice the stream size of memory - one block for the old buffer and one block for the new buffer. This becomes a huge drawback when a stream reaches a significant size. Especially in server environments, where many PDF documents might be created simultaneously, it has great impact on performance, when the operating system starts swapping virtual memory to disk, because it runs out of real memory. Assuming that PDF documents created by your server application are of the same average size, it means you can create simultaneously twice as much documents on the same machine without running out of memory.
"Thanks for providing such a high-quality tool that makes work for us grunt programmers easier."
Marty Cantwell, Custom Softworks, Owner / Programmer, Independence, Missouri, USA