I have a athena table with many columns which loads data from a s3 bucket location. Lets say the data size stored in athena table is 1 gb .
I want to query the table data based on a particular id. so for N number of id, i have to scan N* 1 gb amount of data.
To avoid this situation and reduce cost. I'd like to partition the table based on the column name id.
CREATE EXTERNAL TABLE `newtable`(
`abc` int,
`bcd` string,
`cde` int,
`def` int,
`efg` timestamp,
`egh` int)
PARTITIONED BY (
`id` int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://bucket/folder'
After creating the table with partitioning, i load all partitions. When i try to load the data.
It shows no records found.
What I have tried:
MSCK REPAIR TABLE seatdata_cas;