Click here to Skip to main content
16,002,118 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have a athena table with many columns which loads data from a s3 bucket location. Lets say the data size stored in athena table is 1 gb .

I want to query the table data based on a particular id. so for N number of id, i have to scan N* 1 gb amount of data.

To avoid this situation and reduce cost. I'd like to partition the table based on the column name id.

CREATE EXTERNAL TABLE `newtable`(
  `abc` int, 
  `bcd` string, 
  `cde` int, 
  `def` int, 
  `efg` timestamp,  
  `egh` int)
PARTITIONED BY ( 
  `id` int)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://bucket/folder'


After creating the table with partitioning, i load all partitions. When i try to load the data.
It shows no records found.

What I have tried:

MSCK REPAIR TABLE seatdata_cas;
Posted

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900