Click here to Skip to main content
15,742,619 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have a Parquet, that doesn't have all the column as in my database table.

All I did is I read Parquet into the DataTable, map the columns from parquet to columns in database table and do WriteToServer of the DataTable.

I get below error

Core Microsoft SqlClient Data Provider: Column count in target table does not match column count specified in input. If BCP command, ensure format file column count matches destination table. If SSIS data import, check column mappings are consistent with target.

I want to find a way to ignore the extra columns in the database and use its default value.

What I have tried:

DataTable tbl = new DataTable();

            var connectingStr = System.Environment.GetEnvironmentVariable($"SqlConnectionSynapse");
            SqlConnection con = new SqlConnection(connectingStr);
            //create object of SqlBulkCopy which help to insert  
            SqlBulkCopy objbulk = new SqlBulkCopy(con);

            //assign Destination table name  
            objbulk.DestinationTableName = "DestSampleTable";

            using (var parquetReader = new ParquetReader(inputFileStream))
                var dataFields = parquetReader.Schema.GetDataFields();

                // enumerate through row groups in this file
                for (int i = 0; i < parquetReader.RowGroupCount; i++)
                    // create row group reader
                    using (ParquetRowGroupReader groupReader = parquetReader.OpenRowGroupReader(i))
                        var columns = dataFields.Select(groupReader.ReadColumn).ToArray();
                        foreach (var column in columns)
                            var type = column.Field.DataType;
                            tbl.Columns.Add(new DataColumn(column.Field.Name));
                            objbulk.ColumnMappings.Add(column.Field.Name, column.Field.Name);

                        for (int j = 0; j < groupReader.RowCount; j++)
                            DataRow dr = tbl.NewRow();
                            foreach (var column in columns)
                                dr[column.Field.Name] = column.Data.GetValue(j);


            //insert bulk Records into DataBase.  
            objbulk.BulkCopyTimeout = 0;
Updated 18-Jul-22 1:49am
PIEBALDconsult 11-Jul-22 19:23pm    
What's a Parquet ?

I use SqlBulkCopy quite a bit, but I have never used ColumnMappings.
Praneeth-THECB 12-Jul-22 13:06pm    
It is a file format represented in columns format. ColumnMappings is used when order of columns or the count of source table don't match with destination table.
Gerry Schmitz 11-Jul-22 19:49pm    
Create the data table; then think about the bulk copy; you've got the two mixed together, confusing the whole extract and translate process. That's why it's called ETL.

1 solution

Have a look at the demo code at SqlBulkCopyColumnMapping.DestinationColumn Property (System.Data.SqlClient) | Microsoft Docs[^]

In particular note the comment against their example
Although the number of columns in the destination matches the number of columns in the source, the column names and ordinal positions do not match.
Only the columns for which you have created mappings will be copied over. In other words, your intent
I want to find a way to ignore the extra columns in the database and use its default value.
is achieved by using mappings. You may still run into issues if there is no default value for a target column that is not supplied and it is defined as NOT NULL.

As Gerry has said, your use of tbl is just confusing matters. It looks more like one of the columns in your ParquetReader doesn't exist on the target table (or may have a different type). You are blindly creating a mapping for everything on the reader
objbulk.ColumnMappings.Add(column.Field.Name, column.Field.Name);
You should compare the schema of the ParquetReader to the target table and only add the mapping if the target column exists
Share this answer
Maciej Los 18-Jul-22 14:57pm    

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900