1 Overview
JSON (JavaScript Object Notation) is a widely used lightweight data format. json
Modules in the Python standard library provide functions for processing JSON data.
A very commonly used basic data structure in Python is a dictionary. Its typical structure is as follows:
d = {
'a': 123,
'b': {
'x': ['A', 'B', 'C']
}
}
And the structure of JSON is as follows:
{
"a": 123,
"b": {
"x": ["A", "B", "C"]
}
}
As you can see, Dictionary and JSON are very close, and json
the main function provided by the library in Python is the conversion between the two.
2. Read JSON
json.loads
The method can convert a JSON data str
, bytes
or bytearray
object, into a Python Dictionary. Its gestalt interface signature is as follows:
json.loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
2.1 The simplest example
json.loads
The most basic way to use it is to pass a string containing JSON data str
to this method:
>>> json.loads('{"a": 123}')
{'a': 123}
Notice
In Python, str
a value can be enclosed in a pair of single quotes or a pair of double quotes:
>>> 'ABC' == "ABC"
True
str
Therefore, it is legal and equivalent to use single or double quotes when defining the keys and values of the Dictionary type:
>>> {"a": 'ABC'} == {'a': "ABC"}
True
However, in JSON, string data can only be placed in double quotes, so json.loads
in the JSON content of the string processed by the method, the string must use double quotes. Otherwise, a decoding error will occur:
>>> json.loads("{'a': 123}")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/decoder.py", line 355, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
If the Python string being processed is enclosed in double quotes, the double quotes in JSON need to be escaped:
>>> json.loads("{\"a\": 123}")
{'a': 123}
2.2 bytes
and bytearray
data
For sums where the content is JSON data, the bytes
method bytearray
can json.loads
also handle:
>>> json.loads('{"a": 123}'.encode('UTF-8'))
{'a': 123}
>>> json.loads(bytearray('{"a": 123}', 'UTF-8'))
{'a': 123}
2.3 Encoding format
json.loads
The second parameter encoding
has no actual effect.
Since types in Python 3 str
always use UTF-8 encoding, when the s
parameter is a str
type, the json.loads
method automatically uses UTF-8 encoding. Also, it str
cannot start with a BOM byte.
When the s
parameter is bytes
or bytearray
, the json.loads
method will automatically determine whether it is UTF-8, UTF-16 or UTF-32 encoding. By default, it is also converted into an str
object according to UTF-8 encoding for subsequent processing.
2.4 Data type conversion
JSON can represent four main types of data
string string
number number
boolean class boolean
null
and a two-junction data structure
object object
array array
In the default implementation, the data conversion correspondence between JSON and Python is as follows:
JSON | Python |
---|---|
object | dict |
array | list |
string | str |
number (int) | int |
number (real) | float |
true | True |
false | False |
null | None |
The actual conversion is as follows:
>>> json.loads("""
... {
... "obj": {
... "str": "ABC",
... "int": 123,
... "float": -321.89,
... "bool_true": true,
... "bool_false": false,
... "null": null,
... "array": [1, 2, 3]
... }
... }""")
{'obj': {'str': 'ABC', 'int': 123, 'float': -321.89, 'bool_true': True, 'bool_false': False, 'null': None, 'array': [1, 2, 3]}}
For data of type number in JSON, the following points need to be noted:
The precision of the real number type in JSON cannot exceed the precision range of the float type in Python, otherwise there will be a loss of precision. The following example:
>>> json.loads('3.141592653589793238462643383279') 3.141592653589793
The JSON standard does not include non-numeric NaN, positive infinity and negative infinity-Infinity, but the
json.loads
method defaults to convert theNaN
,Infinity
, in the JSON string to the , and-Infinity
in the Python string . Note that the , , in the JSON must be correctly cased and Spell complete. Example belowfloat('nan')
float('inf')
float('-inf')
NaN
Infinity
-Infinity
>>> json.loads('{"inf": Infinity, "nan": NaN, "ninf": -Infinity}') {'inf': inf, 'nan': nan, 'ninf': -inf}
2.5 Custom JSON object conversion type
json.loads
By default, the object data in JSON is converted to Dictionary type, and object_hook
parameters can be used to change the constructed object.
object_hook
Accepts a function whose input parameter is a Dictionary object converted from object data in JSON, and its return value is a custom object. As shown in the following example:
>>> class MyJSONObj:
... def __init__(self, x):
... self.x = x
...
>>> def my_json_obj_hook(data):
... print('obj_hook data: %s' % data)
... return MyJSONObj(data['x'])
...
>>> result = json.loads('{"x": 123}', object_hook=my_json_obj_hook)
obj_hook data: {'x': 123}
>>> type(result)
<class '__main__.MyJSONObj'>
>>> result.x
123
When the objects in JSON are nested, the json.loads
method will traverse the object tree in a depth-first manner, and pass the object data of each layer to object_hook
. The Python object constructed from the JSON object of the leaf node will be passed as a value of the parent node. The method given to the parent node object_hook
. For example:
>>> class MyJSONObj:
... def __init__(self, x, y):
... self.x = x
... self.y = y
...
>>> def my_json_obj_hook(data):
... print('obj_hook data: %s' % data)
... return MyJSONObj(**data)
...
>>> result = json.loads('{"x": {"x": 11, "y": 12}, "y": {"x": 21, "y":22}}', object_hook=my_json_obj_hook)
obj_hook data: {'x': 11, 'y': 12}
obj_hook data: {'x': 21, 'y': 22}
obj_hook data: {'x': <__main__.MyJSONObj object at 0x10417ef28>, 'y': <__main__.MyJSONObj object at 0x10417ed68>}
In addition to the object_hook
parameter, there is also a object_pairs_hook
parameter. This parameter can also be used to change json.loads
the type of the Python object constructed by the method. The object_hook
difference between this parameter and the parameter is that the input data received by the incoming method is not a Dictionary, but an array containing tuple
. list
Each tuple
has two elements, the first element is the key in the JSON data, and the second element is the value corresponding to this key. Such as a JSON object
{
"a": 123,
"b": "ABC"
}
The corresponding input data is
[
('a': 123),
('b', 'ABC')
]
When calling the json.loads
method, specifying object_hook
both and object_pairs_hook
, object_pairs_hook
overrides the object_hook
parameters.
2.6 Custom JSON number conversion type
In the default implementation, real numbers in JSON are converted to Python float
types, and integers are converted to int
or long
types. Similarly object_hook
, we can specify custom conversion logic through parse_float
and parameters. The input parameters of these two methods are JSON real numbers or integers. parse_int
String. In the following example, we convert real numbers to numpy.float64
and integers to numpy.int64
:
>>> def my_parse_float(f):
... print('%s(%s)' % (type(f), f))
... return numpy.float64(f)
...
>>> def my_parse_int(i):
... print('%s(%s)' % (type(i), i))
... return numpy.int64(i)
...
>>> result = json.loads('{"i": 123, "f": 321.45}', parse_float=my_parse_float, parse_int=my_parse_int)
<type 'str'>(123)
<type 'str'>(321.45)
>>> type(result['i'])
<type 'numpy.int64'>
>>> type(result['f'])
<type 'numpy.float64'>
2.6.1 Custom NaN
, Infinity
and -Infinity
conversion types
Since standard JSON data does not support NaN
, Infinity
and -Infinity
, parse_float
these values will not be received. When you need to customize the objects converted by these values, you need to use another interface parse_constant
. For example, in the following example, these values Values are also converted to numpy.float64
types:
>>> def my_parse_constant(data):
... print('%s(%s)' % (type(data), data))
... return numpy.float64(data)
...
>>> result = json.loads('{"inf": Infinity, "nan": NaN, "ninf": -Infinity}', parse_constant=my_parse_constant)
<type 'str'>(Infinity)
<type 'str'>(NaN)
<type 'str'>(-Infinity)
>>> result['inf']
inf
>>> type(result['inf'])
<type 'numpy.float64'>
2.7 Non-object top-level values
According to the JSON specification, a JSON data can contain only one value, not a complete object. This value can be a string, a number, a boolean value, a null value, or an array. Except for these three JSON specifications The given type can also be NaN
, Infinity
or -Infinity
:
>>> json.loads('"hello"')
'hello'
>>> json.loads('123')
123
>>> json.loads('123.34')
123.34
>>> json.loads('true')
True
>>> json.loads('false')
False
>>> print(json.loads('null'))
None
>>> json.loads('[1, 2, 3]')
[1, 2, 3]
2.8 Duplicate key names
In the JSON object of the same level, there should not be duplicate key names, but the JSON specification does not give the handling standard for this situation. In json.loads
, when there are duplicate key names in the JSON data, the latter key values will overwrite the former ones. :
>>> json.loads('{"a": 123, "b": "ABC", "a": 321}')
{'a': 321, 'b': 'ABC'}
2.9 Processing JSON data files
When the JSON data is stored in a file, the json.load
method can be used to read the data from the file and convert it to a Python object. json.load
The first parameter of the method is the file type object pointing to the JSON data file.
For example /tmp/data.json
, the file contains the following:
{
"a": 123,
"b": "ABC"
}
You can use the code in the following example to read and convert JSON data in a file:
>>> with open('/tmp/data.json') as jf:
... json.load(jf)
...
{u'a': 123, u'b': u'ABC'}
In addition to the file type object, as long as it is a read
file-like object that implements the method, it can be used as a fp
parameter, such as in the following example io.StringIO
:
>>> sio = io.StringIO('{"a": 123}')
>>> json.load(sio)
{'a': 123}
json.load
The meanings and usage methods of other parameters of the method are the json.loads
same as those above, and will not be repeated here.
3 Generate JSON
json.dumps
method to convert a Python object to a string representing JONS data. Its full interface signature is as follows:
json.dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)
Its first parameter obj
is the data object to be converted.
>>> json.dumps({'a': 123, 'b': 'ABC'})
'{"a": 123, "b": "ABC"}'
3.1 Encoding format
json.dumps
The ensure_ascii
parameter is used to control the encoding of the generated JSON string. The default value is True
, at this time, all non-ASCII code words will be escaped. If you do not want to escape automatically, the original encoding will be maintained, limited to UTF-8 . As in the example below:
>>> json.dumps({'数字': 123, '字符': '一二三'})
'{"\\u6570\\u5b57": 123, "\\u5b57\\u7b26": "\\u4e00\\u4e8c\\u4e09"}'
>>> json.dumps({'数字': 123, '字符': '一二三'}, ensure_ascii=False)
'{"数字": 123, "字符": "一二三"}'
3.2 Data Type Conversion
In the default implementation json.dumps
, the types of Python objects that can be processed, and all their attribute values, must be dict
, list
, tuple
, str
, float
or int
. The data conversion relationship between these types and JSON is as follows:
Python | JSON |
---|---|
dict | object |
list, tuple | array |
str | string |
int, float, int-&float-derived emuns | number |
True | true |
False | false |
None | null |
The actual conversion situation is as follows:
>>> json.dumps(
... {
... 'str': 'ABC',
... 'int': 123,
... 'float': 321.45,
... 'bool_true': True,
... 'bool_false': False,
... 'none': None,
... 'list': [1, 2, 3],
... 'tuple': [12, 34]
... }
... )
'{"str": "ABC", "int": 123, "float": 321.45, "bool_true": true, "bool_flase": false, "none": null, "list": [1, 2, 3], "tuple": [12, 34]}'
Although the JSON standard does not support NaN
, Infinity
and -Infinity
, json.dumps
the default implementation converts float('nan')
, float('inf')
and float('-inf')
to the constants NaN, Infinity, and -Infinity. As shown in the following example:
>>> json.dumps(
... {
... 'nan': float('nan'),
... 'inf': float('inf'),
... '-inf': float('-inf')
... }
... )
'{"nan": NaN, "inf": Infinity, "-inf": -Infinity}'
Since these constants may cause the generated JSON string to not be processed by other JSON implementations, in order to prevent this from happening, json.dumps
the allow_nan
parameter can be set to True
. At this point, when these values appear in the processed Python object, the json.dumps
method will throw An exception occurred.
3.3 Circular references
json.dumps
The method checks whether there are circular references in the Python object, and if a circular reference is found, an exception is thrown. As shown in the following example:
>>> circular_obj = {}
>>> circular_obj['self'] = circular_obj
>>> circular_obj
{'self': {...}}
>>> json.dumps(circular_obj)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
ValueError: Circular reference detected
If you don't want json.dumps
the method to check for circular references, you can set the check_circular
parameter False
as :
>>> json.dumps(circular_obj, check_circular=False)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/__init__.py", line 238, in dumps
**kw).encode(obj)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
RecursionError: maximum recursion depth exceeded while encoding a JSON object
3.4 JSON string output format
json.dumps
The indent
parameters of the method can be used to control the wrapping and indentation of JSON strings.
indent
The default value of the parameter is None
. At this time, the JSON string will not have line wrapping and indentation effects. As shown below:
>>> print(json.dumps({'a': 123, 'b': {'x': 321, 'y': 'ABC'}}))
{"a": 123, "b": {"x": 321, "y": "ABC"}}
When indent
0 or negative, JSON characters include newlines:
>>> print(json.dumps({'a': 123, 'b': {'x': 321, 'y': 'ABC'}}, indent=-1))
{
"a": 123,
"b": {
"x": 321,
"y": "ABC"
}
}
>>> print(json.dumps({'a': 123, 'b': {'x': 321, 'y': 'ABC'}}, indent=0))
{
"a": 123,
"b": {
"x": 321,
"y": "ABC"
}
}
And when it indent
is a positive integer, in addition to newlines, JSON will also indent the object hierarchy by the specified number of spaces:
>>> print(json.dumps({'a': 123, 'b': {'x': 321, 'y': 'ABC'}}, indent=2))
{
"a": 123,
"b": {
"x": 321,
"y": "ABC"
}
}
indent
Alternatively str
, at this point, JSON will str
be indented by content, such as tabs \t
:
>>> print(json.dumps({'a': 123, 'b': {'x': 321, 'y': 'ABC'}}, indent='\t'))
{
"a": 123,
"b": {
"x": 321,
"y": "ABC"
}
}
json.dumps
Another parameter of separators
can be used to set the output separator. The value of this parameter should be a two-element tuple
separator. The first value is the separator between members, and the second value is the separator between key values. . Its default value will also be indent
affected by the parameters above. When it is ,indent
the default value is , that is, there is a space after the separator. When not , the default value is , that is, only after the separator between key values There will be a space, and the inter-element separator will have no space, because there will be a newline.None
separators
(', ', ': ')
indent
None
(',', ':')
separators
One possible use case for parameters is to reduce the size of the JSON string by removing all non-essential formatting characters. In this case, you can separator
set it to (',', ';')
, and not set indent
the parameter, or explicitly set it to None
:
>>> print(json.dumps({'a': 123, 'b': {'x': 321, 'y': 'ABC'}}, indent=None, separators=(',', ':')))
{"a":123,"b":{"x":321,"y":"ABC"}}
3.5 Converting custom Python objects
json.dumps
The default implementation of can only convert objects of type Dictionary. If you want to convert a custom object, you need to use a default
parameter. This parameter accepts a function, the parameter of this function is a Python object to be converted, and the return value can represent the Python object. Dictionary object. default
The function starts from the top level of the object reference tree and traverses the entire object reference tree layer by layer. Therefore, instead of implementing the traversal logic of the object tree by yourself, you only need to process the objects at the current level. As shown in the following example:
>>> class MyClass:
... def __init__(self, x, y):
... self.x = x
... self.y = y
...
>>> def my_default(o):
... if isinstance(o, MyClass):
... print('%s.y: %s' % (type(o), o.y))
... return {'x': o.x, 'y': o.y}
... print(o)
... return o
...
>>> obj = MyClass(x=MyClass(x=1, y=2), y=11)
>>> json.dumps(obj, default=my_default)
<class '__main__.MyClass'>.y: 11
<class '__main__.MyClass'>.y: 2
'{"x": {"x": 1, "y": 2}, "y": 11}'
3.6 Non-string type key names
In Python, only hashable objects and data can be used as the keys of Dictionary objects, while in the JSON specification, only strings can be used as key names. Therefore, in json.dumps
the implementation of this rule, the Check, but the allowed range of key names has been expanded, str
, int
, float
, bool
and None
types of data can be used as key names. However, when the key name is not str
the case, the key name will be converted to the corresponding str
value. The following example:
>>> json.dumps(
... {
... 'str': 'str',
... 123: 123,
... 321.54: 321.54,
... True: True,
... False: False,
... None: None
... }
... )
'{"str": "str", "123": 123, "321.54": 321.54, "true": true, "false": false, "null": null}'
And when other types of key names appear, an exception is thrown by default:
>>> json.dumps({(1,2): 123})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
TypeError: keys must be a string
json.dumps
The skipkeys
parameter can change this behavior. When skipkeys
set True
to , when an illegal key type is encountered, no exception will be thrown, but the key will be skipped:
>>> json.dumps({(1,2): 123}, skipkeys=True)
'{}'
3.7 Generate JSON file
When you need to save the generated JSON data to a file, you can use a json.dump
method. This method json.dumps
has one more parameter fp
, which is the file object used to save the JSON data. For example, the code in the following example
>>> with open('/tmp/data.json', mode='a') as jf:
... json.dump({'a': 123}, jf)
...
The JSON data will be written to the /tmp/data.json
file. After the code is executed, the content of the file is
{"a": 123}
json.dump
Methods can also accept other file-like objects:
>>> sio = io.StringIO()
>>> json.dump({'a': 123}, sio)
>>> sio.getvalue()
'{"a": 123}'
json.dump
The usage of other parameters json.dumps
is the same as that of , and will not be repeated here.
4 SON decoding and encoding class implementation
json.loads
, json.load
, json.dumps
and json.dump
these four methods complete their respective tasks through json.JSONDecoder
and json.JSONEncoder
these two classes. Therefore, these two classes can also be used directly to complete the functions described above:
>>> json.JSONDecoder().decode('{"a": 123}')
{'a': 123}
>>> json.JSONEncoder().encode({'a': 123})
'{"a": 123}'
json.loads
, json.load
, json.dumps
and json.dump
the parameters of these four methods are mainly passed to the constructor of json.JSONDecoder
and json.JSONEncoder
, so using these methods can meet most needs. When you need to customize json.JSONDecoder
and json.JSONEncoder
subclass, you only need to pass the subclass to the cls
parameter. At the same time, these methods have **kw
parameters. When the constructor of the custom implementation class requires new parameters beyond the standard parameter list, this parameter will pass the new parameters to the constructor of the implementation class.