DEFLATE
compression (RFC1951).
问题是,在Delphi中,我无法访问任何DEFLATE压缩库.但是我们做的一件事就是ZLIB
compression code (RFC1950).它甚至还附带Delphi,还有其他六种实现方式.
在内部,ZLIB也使用DEFLATE进行压缩.所以我想做每个人都做过的事情 – 使用Delphi zlib库来实现其DEFLATE压缩功能.
问题是ZLIB在DEFLATED数据中添加了一个2字节的前缀和4字节的尾部:
[CMF] 1 byte [FLG] 1 byte [...deflate compressed data...] [Adler-32 checksum] 4 bytes
所以我需要的是一种使用标准TCompressionStream(或TZCompressionStream,或TZCompressionStreamEx,取决于您正在使用的源代码)流来压缩数据的方法:
procedure CompressDataToTargetStream(sourceStream: TStream; targetStream: TStream); var compressor: TCompressionStream; begin compressor := TCompressionStream.Create(clDefault,targetStream); //clDefault = CompressionLevel try compressor.CopyFrom(sourceStream,sourceStream.Length) finally compressor.Free; end; end;
这是有效的,除了它写出前导2字节和尾随4字节;我需要去除那些.
所以我写了一个TByteEaterStream:
TByteEaterStream = class(TStream) public constructor Create(TargetStream: TStream; LeadingBytesToEat,TrailingBytesToEat: Integer); end;
例如
procedure CompressDataToTargetStream(sourceStream: TStream; targetStream: TStream); var byteEaterStream: TByteEaterStream; compressor: TCompressionStream; begin byteEaterStream := TByteEaterStream.Create(targetStream,2,4); //2 leading bytes,4 trailing bytes try compressor := TCompressionStream.Create(clDefault,byteEaterStream); //clDefault = CompressionLevel try compressor.CopyFrom(sourceStream,sourceStream.Length) finally compressor.Free; end; finally byteEaterStream.Free; end; end;
此流将覆盖write方法.吃前2个字节是微不足道的.诀窍是吃掉4个字节.
食者流有一个4字节的数组,我总是保持缓冲区中每次写入的最后四个字节.当EaterStream被销毁时,尾随的四个字节随之而来.
问题是通过这个缓冲区洗几百万次写入会破坏性能.上游的典型用途是:
for each of a million data rows stream.Write(s,Length(s)); //30-90 character string
我绝对不希望上游用户必须表明“结束就在附近”.我只是希望它更快.
问题
观察流过的字节流,保留最后四个字节的最佳方法是什么;鉴于你不知道什么时候写作将是最后一次.
我正在修复的代码将整个压缩版本写入TStringStream,然后只抓取900MB – 6个字节来获取内部DEFLATE数据:
cs := TStringStream.Create(''); ....write compressed data to cs S := Copy(CS.DataString,3,Length(CS.DataString) - 6);
除了运行用户内存不足.最初我改变它以写入TFileStream,然后我可以执行相同的技巧.
但我想要更好的解决方案;流解决方案.我希望数据进入压缩的最终流,没有任何中间存储.
我的实施
并不是说它有所帮助;因为我不是要求系统甚至使用适应流来进行修剪
TByteEaterStream = class(TStream) private FTargetStream: TStream; FTargetStreamOwnership: TStreamOwnership; FLeadingBytesToEat: Integer; FTrailingBytesToEat: Integer; FLeadingBytesRemaining: Integer; FBuffer: array of Byte; FValidBufferLength: Integer; function GetBufferValidLength: Integer; public constructor Create(TargetStream: TStream; LeadingBytesToEat,TrailingBytesToEat: Integer; StreamOwnership: TStreamOwnership=soReference); destructor Destroy; override; class procedure SelfTest; procedure Flush; function Read(var Buffer; Count: Longint): Longint; override; function Write(const Buffer; Count: Longint): Longint; override; function Seek(Offset: Longint; Origin: Word): Longint; override; end; { TByteEaterStream } constructor TByteEaterStream.Create(TargetStream: TStream; LeadingBytesToEat,TrailingBytesToEat: Integer; StreamOwnership: TStreamOwnership=soReference); begin inherited Create; //User requested state FTargetStream := TargetStream; FTargetStreamOwnership := StreamOwnership; FLeadingBytesToEat := LeadingBytesToEat; FTrailingBytesToEat := TrailingBytesToEat; //internal housekeeping FLeadingBytesRemaining := FLeadingBytesToEat; SetLength(FBuffer,FTrailingBytesToEat); FValidBufferLength := 0; end; destructor TByteEaterStream.Destroy; begin if FTargetStreamOwnership = soOwned then FTargetStream.Free; FTargetStream := nil; inherited; end; procedure TByteEaterStream.Flush; begin if FValidBufferLength > 0 then begin FTargetStream.Write(FBuffer[0],FValidBufferLength); FValidBufferLength := 0; end; end; function TByteEaterStream.Write(const Buffer; Count: Integer): Longint; var newStart: Pointer; totalCount: Integer; addIndex: Integer; bufferValidLength: Integer; bytesToWrite: Integer; begin Result := Count; if Count = 0 then Exit; if FLeadingBytesRemaining > 0 then begin newStart := Addr(Buffer); Inc(Cardinal(newStart)); Dec(Count); Dec(FLeadingBytesRemaining); Result := Self.Write(newStart^,Count)+1; //tell the upstream guy that we wrote it Exit; end; if FTrailingBytesToEat > 0 then begin if (Count < FTrailingBytesToEat) then begin //There's less bytes incoming than an entire buffer //But the buffer might overfloweth totalCount := FValidBufferLength+Count; //If it could all fit in the buffer,then let it if (totalCount <= FTrailingBytesToEat) then begin Move(Buffer,FBuffer[FValidBufferLength],Count); FValidBufferLength := totalCount; end else begin //We're going to overflow the buffer. //Purge from the buffer the amount that would get pushed FTargetStream.Write(FBuffer[0],totalCount-FTrailingBytesToEat); //Shuffle the buffer down (overlapped move) bufferValidLength := bufferValidLength - (totalCount-FTrailingBytesToEat); Move(FBuffer[totalCount-FTrailingBytesToEat],FBuffer[0],bufferValidLength); addIndex := bufferValidLength ; //where we will add the data to Move(Buffer,FBuffer[addIndex],Count); end; end else if (Count = FTrailingBytesToEat) then begin //The incoming bytes exactly fill the buffer. Flush what we have and eat the incoming amounts Flush; Move(Buffer,FTrailingBytesToEat); FValidBufferLength := FTrailingBytesToEat; Result := FTrailingBytesToEat; //we "wrote" n bytes end else begin //Count is greater than trailing buffer eat size Flush; //Write the data that definitely not to be eaten bytesToWrite := Count-FTrailingBytesToEat; FTargetStream.Write(Buffer,bytesToWrite); //Buffer the remainder newStart := Addr(Buffer); Inc(Cardinal(newStart),bytesToWrite); Move(newStart^,FTrailingBytesToEat); FValidBufferLength := 4; end; end; end; function TByteEaterStream.Seek(Offset: Integer; Origin: Word): Longint; begin //what does it mean if they want to seek around when i'm supposed to be eating data? //i don't know; so results are,by definition,undefined. Don't use at your own risk Result := FTargetStream.Seek(Offset,Origin); end; function TByteEaterStream.Read(var Buffer; Count: Integer): Longint; begin //what does it mean if they want to read back bytes when i'm supposed to be eating data? //i don't know; so results are,undefined. Don't use at your own risk Result := FTargetStream.Read({var}Buffer,Count); end; class procedure TByteEaterStream.SelfTest; procedure CheckEquals(Expected,Actual: string; Message: string); begin if Actual <> Expected then raise Exception.CreateFmt('TByteEaterStream self-test failed. Expected "%s",but was "%s". Message: %s',[Expected,Actual,Message]); end; procedure Test(const InputString: string; ExpectedString: string); var s: TStringStream; eater: TByteEaterStream; begin s := TStringStream.Create(''); try eater := TByteEaterStream.Create(s,4,soReference); try eater.Write(InputString[1],Length(InputString)); finally eater.Free; end; CheckEquals(ExpectedString,s.DataString,InputString); finally s.Free; end; end; begin Test('1',''); Test('11',''); Test('113',''); Test('1133',''); Test('11333',''); Test('113333',''); Test('11H3333','H'); Test('11He3333','He'); Test('11Hel3333','Hel'); Test('11Hell3333','Hell'); Test('11Hello3333','Hello'); Test('11Hello,3333','Hello,'); Test('11Hello,W3333',W'); Test('11Hello,Wo3333',Wo'); Test('11Hello,Wor3333',Wor'); Test('11Hello,Worl3333',Worl'); Test('11Hello,World3333',World'); Test('11Hello,World!3333',World!'); end;
解决方法
所以,我建议这样做:
>使用使用缓冲的流适配器.
>吃前导字节很容易.刚刚将前两个字节发送到遗忘状态.
>在缓冲区之后写入要写入的字节,当需要刷新时,刷新缓冲区中除最后四个字节外的所有字节.
>刷新时,将未刷新的四个字节复制到缓冲区的开头,这样就不会丢失它们.
>关闭流时,将其冲洗,就像对缓冲流一样.并使用与以前相同的刷新技术,以便保持最后的四个字节.此时您知道这些是流的最后四个字节.
上述方法要求的一个要求是缓冲区的大小必须大于要剥离的尾随字节数.
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。