Response Body Encoding
here is a snippet demonstrating an encoding issue with responses by tgcrud.
Without encoding set manually, the response .text
property will print <title>Märchen</title>
as the encoding is guessed wrong by requests
.
import requests
def read_metadata(url: str, textgrid_uri: str, sid: str = ''):
# defer downloading the response body until accessing Response.content
response = requests.get(url + '/' + textgrid_uri +
'/metadata?sessionId=' + sid, stream=True)
# TODO implement error handling on http status not in 200..299
return response
test = read_metadata('https://textgridlab.org/1.0/tgcrud/rest/', 'textgrid:s60f.0', '')
print(test.encoding)
print(test.text)
test.encoding = 'utf-8'
print(test.encoding)
print(test.text)
-
The setting response.encoding = 'utf-8'
has to be added to every response handler.
(it is getting worse when reuploading a metadata file, like we are doing for update
.)
Alternative: @sfunk Can tgcrud provide an encoding description with a http header?