Click here to Skip to main content
15,907,874 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
New python user here, looking to learn more about Pandas.
I have an output as a .csv file from our ERP containing the following relevant info:

Item No. --- Sales Order No. --- Quantity
123 ABCD 2
456 EFGH 5
789 ABCD 8

What I'd like to do is determine on what percentage of sales orders does each item match up with each other item, filter out everything below X%, and output the high match pairs. The goal is to provide Sales with guidance regarding potential sales packages to boost production efficiency and reduce inventory costs.

What I have tried:

We made an excel VBA program but it takes too long and tends to crash the system.
Updated 26-Apr-22 6:00am
CHill60 26-Apr-22 9:10am    
The "What I have tried:" section is where you are meant to put the Python code that you have already tried to write. You should also include the expected results from the sample data provided and clarify what you mean by "match up with each other item" - i.e. match on what?
Christopher Bogart 26-Apr-22 9:13am    
I have not tried python code because I'm not really sure where to begin. I'm trying to figure out an output such as "item 123 & item 789 match over 90% of all sales orders for item 123."
CHill60 26-Apr-22 9:32am    
Why figure out the output if there is an existing VBA program that does what you need? But again - what does "match over 90% of all sales orders" actually mean? What matches what?
Christopher Bogart 26-Apr-22 9:54am    
The VBA code does not work unfortunately. Match means they exist on the same Sales Order; sold together often. Example data would be Item 123 & Item 456 both have one instance of Sales Order ABC. That's 1 "match". The goal is to compile this over all combinations of items & sales orders, then filter out the highest frequency pairs.

This is the approach I would probably take
1. identify all of the pairs of items that are possible - have a look at itertools.Combinations[^] - use r = 2. E.g.
2. Generate pivoted data from the CSV to get [Sales Order No.] [list of items] e.g.
ABCD 123|789|
EFGH 456|
3. Compare the two lists to get the counts of matches e.g. see The Best Ways to Compare Two Lists in Python[^] - although to be honest by this stage I would probably be harnessing a database or Power Query

Edit: My step 3 is too simplistic - I'd probably do a count where Part 1 of my combo is "somewhere" in the list of items per order AND Part 2 of my combo is "somewhere" in the list of items per order, and therefore would not need the reversed combinations in the initial list i.e. would just need
Share this answer

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900