我自己的base64编码实现有问题。我已经获得了下面的代码。我想,它仅适用于带有英文字母的文本文件。例如,pdf文件是经过编码和解码的,它不同于单个字符。
def base64Encode(data):
alphabet = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P","Q","R","S","T","U","V","W","X","Y","Z","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","0","1","2","3","4","5","6","7","8","9","+","/"]
bit_str = ""
base64_str = ""
for char in data:
bin_char = bin(char).lstrip("0b")
bin_char = bin_char.zfill(8)
bit_str += bin_char
brackets = [bit_str[x:x+6] for x in range(0,len(bit_str),6)]
for bracket in brackets:
if(len(bracket) < 6):
bracket = bracket + (6-len(bracket))*"0"
base64_str += alphabet[int(bracket,2)]
# print(brackets[-4:])
#if(bracket[-1:)
#print(len(base64_str))
#if(len(base64_str) != 76):
# base64_str += "="
return base64_str
def base64Decode(text):
alphabet = ["A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","0","1","2","3","4","5","6","7","8","9","+","/"]
bit_str = ""
text_str = ""
for char in text:
if char in alphabet:
bin_char = bin(alphabet.index(char)).lstrip("0b")
bin_char = bin_char.zfill(6)
bit_str += bin_char
brackets = [bit_str[x:x+8] for x in range(0,len(bit_str),8)]
for bracket in brackets:
text_str += chr(int(bracket,2))
return text_str.encode("UTF-8")
w = open("encode.txt", "w")
with open("bla.txt", "rb") as f:
byte = f.read(57)
while byte:
w.write(base64Encode(byte))
w.write("\n")
byte = f.read(57)
w.close()
f.close()
w = open("decode.txt", "wb")
with open("encode.txt", "r") as f:
byte = f.read(77)
while byte:
w.write(base64Decode(byte))
byte = f.read(77)
w.close()
f.close()
我认为,这行“ return text_str.encode(“ UTF-8”)“应该不解码为UTF-8。但是,如果仅留下“ return text_str”,则会出现错误:TypeError:'str'不支持缓冲区接口。
bla.txt:
Phil Mercer reports on Cyclone Pam which has ravaged the Pacific nation of Vanuatu. Video courtesy of YouTube/Isso Nihmei at 350.org
Save the Children's Vanuatu country director Tom Skirrow said on Saturday: "The scene here this morning is complete devastation - houses are destroyed, trees are down, roads are blocked and people are wandering the streets looking for help.
ĄŚĆŹŻÓ
encode.txt
UGhpbCBNZXJjZXIgcmVwb3J0cyBvbiBDeWNsb25lIFBhbSB3aGljaCBoYXMgcmF2YWdlZCB0aGUg
UGFjaWZpYyBuYXRpb24gb2YgVmFudWF0dS4gVmlkZW8gY291cnRlc3kgb2YgWW91VHViZS9Jc3Nv
IE5paG1laSBhdCAzNTAub3JnDQoNClNhdmUgdGhlIENoaWxkcmVuJ3MgVmFudWF0dSBjb3VudHJ5
IGRpcmVjdG9yIFRvbSBTa2lycm93IHNhaWQgb24gU2F0dXJkYXk6ICJUaGUgc2NlbmUgaGVyZSB0
aGlzIG1vcm5pbmcgaXMgY29tcGxldGUgZGV2YXN0YXRpb24gLSBob3VzZXMgYXJlIGRlc3Ryb3ll
ZCwgdHJlZXMgYXJlIGRvd24sIHJvYWRzIGFyZSBibG9ja2VkIGFuZCBwZW9wbGUgYXJlIHdhbmRl
cmluZyB0aGUgc3RyZWV0cyBsb29raW5nIGZvciBoZWxwLg0KDQrEhMWaxIbFucW7w5M
encode.txt
Phil Mercer reports on Cyclone Pam which has ravaged the Pacific nation of Vanuatu. Video courtesy of YouTube/Isso Nihmei at 350.org
Save the Children's Vanuatu country director Tom Skirrow said on Saturday: "The scene here this morning is complete devastation - houses are destroyed, trees are down, roads are blocked and people are wandering the streets looking for help.
ÄÅÄŹŻÃ
页面编码相同的文本:http : //www.motobit.com/util/base64-decoder-encoder.asp
UGhpbCBNZXJjZXIgcmVwb3J0cyBvbiBDeWNsb25lIFBhbSB3aGljaCBoYXMgcmF2YWdlZCB0aGUg
UGFjaWZpYyBuYXRpb24gb2YgVmFudWF0dS4gVmlkZW8gY291cnRlc3kgb2YgWW91VHViZS9Jc3Nv
IE5paG1laSBhdCAzNTAub3JnDQoNClNhdmUgdGhlIENoaWxkcmVuJ3MgVmFudWF0dSBjb3VudHJ5
IGRpcmVjdG9yIFRvbSBTa2lycm93IHNhaWQgb24gU2F0dXJkYXk6ICJUaGUgc2NlbmUgaGVyZSB0
aGlzIG1vcm5pbmcgaXMgY29tcGxldGUgZGV2YXN0YXRpb24gLSBob3VzZXMgYXJlIGRlc3Ryb3ll
ZCwgdHJlZXMgYXJlIGRvd24sIHJvYWRzIGFyZSBibG9ja2VkIGFuZCBwZW9wbGUgYXJlIHdhbmRl
cmluZyB0aGUgc3RyZWV0cyBsb29raW5nIGZvciBoZWxwLg0KDQrEhMWaxIbFucW7w5M=
除“ =”之外,其他都是一样的,由于文件开头的错误,省略了该实现。
并以pdf格式示例原始文件:
%PDF-1.5
%µµµµ
1 0 obj
<</Type/Catalog/Pages 2 0 R/Lang(pl-PL) /StructTreeRoot 8 0 R/MarkInfo<</Marked true>>>>
endobj
2 0 obj
<</Type/Pages/Count 1/Kids[ 3 0 R] >>
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 595.32 841.92] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 0>>
endobj
4 0 obj
<</Filter/FlateDecode/Length 110>>
stream
xœUÌ
€@ྰï0QËÝ®Èiž?(†kb°hòý«ZD˜4ßÀΨ*;…¡xº ¨#“íªFrÄI!w…˜2ËQ81®D<™ÇS=Ó’léŠ82µ·>^åŒÊO- >[´SÀ
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/ABCDEE+Calibri/Encoding/WinAnsiEncoding/FontDescriptor 6 0 R/FirstChar 32/LastChar 97/Widths 15 0 R>>
endobj
6 0 obj
<</Type/FontDescriptor/FontName/ABCDEE+Calibri/Flags 32/ItalicAngle 0/Ascent 750/Descent -250/CapHeight 750/AvgWidth 521/MaxWidth 1743/FontWeight 400/XHeight 250/StemV 52/FontBBox[ -503 -250 1240 750] /FontFile2 16 0 R>>
endobj
7 0 obj
在执行脚本后:
%PDF-1.5
%µµµµ
1 0 obj
<</Type/Catalog/Pages 2 0 R/Lang(pl-PL) /StructTreeRoot 8 0 R/MarkInfo<</Marked true>>>>
endobj
2 0 obj
<</Type/Pages/Count 1/Kids[ 3 0 R] >>
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 595.32 841.92] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 0>>
endobj
4 0 obj
<</Filter/FlateDecode/Length 110>>
stream
xUÌ
@ྰï0QËÝ®Èi?(kb°hòý«ZD4ßÀΨ*;¡xº ¨#íªFrÄI!w2ËQ81®D<ÇS=Ólé82µ·>^åÊO- >[´SÀ
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/ABCDEE+Calibri/Encoding/WinAnsiEncoding/FontDescriptor 6 0 R/FirstChar 32/LastChar 97/Widths 15 0 R>>
endobj
6 0 obj
<</Type/FontDescriptor/FontName/ABCDEE+Calibri/Flags 32/ItalicAngle 0/Ascent 750/Descent -250/CapHeight 750/AvgWidth 521/MaxWidth 1743/FontWeight 400/XHeight 250/StemV 52/FontBBox[ -503 -250 1240 750] /FontFile2 16 0 R>>
endobj
7 0 obj
例如,差异在第15和16行的开头。
我的目标是加载文件并在base64中对其进行编码,然后解码并获得相同的文件。适合使用。我想错误是在数据读取或写入或编码中。有什么建议?
我能够完成这项任务。将.encode(“ latin-1”)上的.encode(“ UTF-8”)行替换为至少对pdf文件有效。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句