Say you have some JSON data like this (I’ll be using Go on purpose here):
jsonString := `{
"number": "1600",
"street_name": "Pennsylvania",
"street_type": "Avenue"
}`
Code language: JSON / JSON with Comments (json)
The challenge here is to build a machine that does four things:
- Validate that the JSON is valid syntax.
- Validate that the JSON is valid against a schema.
- If there is a problem with one part of the data, first attempt to fix it (see how the value for
"number"
is"1600"
, that’s a basic mistake, try to coerce it into anint
of1600
). - If that part of the bad data cannot be coerced, or if it’s invalid even after it is, fall back to a default value.
1) Validation is fairly easy.
Go standard lib can do it.
jsonString := `{
"number": "1600",
"street_name": "Pennsylvania",
"street_type": "Avenue"
}`
isValid := json.Valid([]byte(jsonString))
if isValid {
fmt.Println("JSON data is valid in most basic sense.")
} else {
fmt.Println("ERROR! String is not valid JSON.")
}
Code language: Go (go)
2) Validation against schema can be done with a lib.
I don’t know if this is the only option, but gojsonschema works.
jsonString := `{
"number": "1600",
"street_name": "Pennsylvania",
"street_type": "Avenue"
}`
schema := gojsonschema.NewStringLoader(`{
"type": "object",
"properties": {
"number": { "type": "number" },
"street_name": { "type": "string" },
"street_type": { "enum": ["Street", "Avenue", "Boulevard"] }
}
}`)
json := gojsonschema.NewStringLoader(jsonString)
result, err := gojsonschema.Validate(schema, json)
if err != nil {
panic(err.Error())
}
if result.Valid() {
fmt.Printf("The document is valid\n")
} else {
fmt.Printf("The document is not valid. see errors :\n")
for _, desc := range result.Errors() {
fmt.Printf("- %s\n", desc)
}
}
Code language: Go (go)
3) Fixing theoretically easy to fix data
Here’s that original JSON again:
jsonString := `{
"number": "1600",
"street_name": "Pennsylvania",
"street_type": "Avenue"
}`
Code language: Go (go)
And note in the schema, we’re expecting a number: "number": { "type": "number" }
. If we ran the validation in Step 2 above, we’d get:
> make run go build -o main . ./main The document is not valid. see errors : - number: Invalid type. Expected: number, given: string
So that’s true for schema validation, but it would also be true if we tried to json.Unmarshal
the data into a struct with strict types (which we definitely do). So in addition the schema, we have a type
which is also kind of a schema.
type address struct {
Number int `json:"number"`
StreetName string `json:"street_name"`
StreetType string `json:"street_type"`
}
Code language: Go (go)
If we tried to parse the JSON now, we’d get a similar error to the schema checking:
var add address
err := json.Unmarshal([]byte(jsonString), &add)
if err != nil {
fmt.Println(err)
}
Code language: Go (go)
> make run go build -o main . ./main json: cannot unmarshal string into Go struct field address.number of type int
The hope is that there is a way to run some kind of callback function to try to coerce the problematic bit of data into something that is valid. So "1600"
is so obviously just incorrectly a string, the callback would force it into an int
and all would be well.
This is where I’m kinda stuck, and will update this post when it’s figured out.
- Can fastjson help? It’s README says something about error handling but I don’t see how.
- Can mapstructure help? It says “This library is most useful when decoding values from some data stream (JSON, Gob, etc.) where you don’t quite know the structure of the underlying data until you read a part of it.” which seems like a good lead.
- Can validator help?
Update: You can provide Unmarshaling instructions for custom types
This article was very helpful.
Rather than an int
like you want it, call it something else:
type FlexInt int
type FlexAddress struct {
Number FlexInt `json:"number"`
StreetName string `json:"street_name"`
StreetType string `json:"street_type"`
}
Code language: Go (go)
Now as Marko Mikulicic says:
All you have to do is implement the json.Unmarshaler interface.
So:
func (fi *FlexInt) UnmarshalJSON(b []byte) error {
if b[0] != '"' {
return json.Unmarshal(b, (*int)(fi))
}
var s string
if err := json.Unmarshal(b, &s); err != nil {
return err
}
i, err := strconv.Atoi(s)
if err != nil {
return err
}
*fi = FlexInt(i)
return nil
}
Code language: Go (go)
This is awfully clever I think.
- It checks the first character of the value of the
FlexInt
and if it’s not a double-quote mark (like is required for a JSON string), then assume its anint
and Unmarshal it that way. - Then try to Unmarshal it as a string and return that if it works
- Then try to coerce it into an
int
and if that works, great - Errors returned if nothing seems to work (could always try accounting for more situations)
So this handles trying to fix decently-easy-to-fix JSON type errors, and also gives an opportunity to just return some kind of default value if every attempt at fixing it fails.
4) Default / Fallback Values
One issue here is where to put the fallback values. If we know we’re exclusively dealing with JSON data, it seems like the JSON schema would be the place. That can look like:
{
"type": "object",
"properties": {
"number": { "type": "number", "default": 1000 },
"street_name": { "type": "string" },
"street_type": { "enum": ["Street", "Avenue", "Boulevard"] }
}
}
Code language: JSON / JSON with Comments (json)
But the trouble here is that by the time we’re parsing/unmarshaling the data, that’s in Go, so we’d have to somehow come back to the schema and parse that and pluck the data out to use. Just seems weird.
Maybe we’ll have to do Step 3, then if we find the data to be unfixable, remove it, then run it back through a JSON schema situation where it puts default values back into the parsed data. Again something I don’t really know how to do, but seems plausible. Plus it does double duty. I would think this machine would optionally be able to put in default values. Not always, sometimes missing fields are better, but it could put in defaults on command.
Bonus: Not just JSON
I think JSON is the primary use case here, but not all data is passed around as JSON. Perhaps this machine could do the same kind of thing for data that is already in Go. Step 1 becomes irrelevant (Go code will just choke on invalid syntax), but the rest still matter. Can a struct
have a schema with allowed values? So not just int
but an int
with min
and max
? Not just a string
but a string
with a valid set of ENUM values. Seems like that should be no huge problem. Can a struct
with a value outside what the schema allows be fixed or reverted to a default value? Hopefully?
In this case, wouldn’t it make more sense to put the default values in the type
definition rather than a JSON schema, so like this NOT REAL code:
type address struct {
Number int `json:"number",default:1000`
StreetName string `json:"street_name"`
StreetType string `json:"street_type"`
}
Code language: JavaScript (javascript)
Then if you need JSON schema also, you could generate it from this type
? I’m already out of my depth here and this is doubly so, but also seems possible.
To a human it is (though a human could mistake “16OO”). I guess if the expected type is Int, you could have a routine that checks each character in the string to see if it matches the ASCII digits then cast it to an int if they do.
Regarding the bonus question, I don’t know Go at all but my hunch is structs are intentionally too primitive to contain something like a valid range for a typed value unless it’s as “NumberMin” and “NumberMax” ints and you write an .isValid() method that uses such values, if present.
If this isn’t hypothetical, the address number should probably remain a string, otherwise you’re in “Falsehoods programmers believe about addresses” territory.
Yep looks like thats part of the solution! This is helping me:
https://docs.bitnami.com/tutorials/dealing-with-json-with-non-homogeneous-types-in-go/
It is totally hypothetical. I know what you mean about addresses. My goal here is error correction and defaults for data generally and this was just an easy example.