Spark version: 2.2.0
Spotify/spark-bigquery version: 0.2.2
Hi,
I am trying to use the saveAsBigQuery table function to write a schema that has an array of struct as a field. However, I am getting the following error:
The Apache Avro library failed to parse the header with the follwing error: Invalid namespace: .topic_scores
The offending field is:
{
"type": [
{
"items": [
{
"namespace": ".topic_scores",
"type": "record",
"name": "topic_scores",
"fields": [
{
"type": "int",
"name": "index"
},
{
"type": "float",
"name": "score"
}
]
},
"null"
],
"type": "array"
},
"null"
],
"name": "topic_scores"
}
You can see that the namespace field begins with a dot. My guess is that the issue stems from https://github.com/spotify/spark-bigquery/blob/master/src/main/scala/com/databricks/spark/avro/SchemaConverters.scala#L342-L346
I can't find a way to configure the recordNamespace value. According to avro documentation:
You can specify the record name and namespace like this:
import com.databricks.spark.avro._
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().master("local").getOrCreate()
val df = spark.read.avro("src/test/resources/episodes.avro")
val name = "AvroTest"
val namespace = "com.databricks.spark.avro"
val parameters = Map("recordName" -> name, "recordNamespace" -> namespace)
df.write.options(parameters).avro("/tmp/output")
I think this is the line that reads that option, and sets the value to an empty string if not provided: https://github.com/databricks/spark-avro/blob/branch-4.0/src/main/scala/com/databricks/spark/avro/DefaultSource.scala#L114
These options are not parameterized anywhere in the Spotify library. Has anyone seen this issue or have a workaround? Thanks!
Spark version: 2.2.0
Spotify/spark-bigquery version: 0.2.2
Hi,
I am trying to use the
saveAsBigQuerytable function to write a schema that has an array of struct as a field. However, I am getting the following error:The Apache Avro library failed to parse the header with the follwing error: Invalid namespace: .topic_scoresThe offending field is:
You can see that the namespace field begins with a dot. My guess is that the issue stems from https://github.com/spotify/spark-bigquery/blob/master/src/main/scala/com/databricks/spark/avro/SchemaConverters.scala#L342-L346
I can't find a way to configure the
recordNamespacevalue. According to avro documentation:You can specify the record name and namespace like this:
I think this is the line that reads that option, and sets the value to an empty string if not provided: https://github.com/databricks/spark-avro/blob/branch-4.0/src/main/scala/com/databricks/spark/avro/DefaultSource.scala#L114
These options are not parameterized anywhere in the Spotify library. Has anyone seen this issue or have a workaround? Thanks!