15,907,874 members
0.00/5 (No votes)
See more:
New python user here, looking to learn more about Pandas.
I have an output as a .csv file from our ERP containing the following relevant info:

Item No. --- Sales Order No. --- Quantity
123 ABCD 2
456 EFGH 5
789 ABCD 8

What I'd like to do is determine on what percentage of sales orders does each item match up with each other item, filter out everything below X%, and output the high match pairs. The goal is to provide Sales with guidance regarding potential sales packages to boost production efficiency and reduce inventory costs.

What I have tried:

We made an excel VBA program but it takes too long and tends to crash the system.
Posted
Updated 26-Apr-22 6:00am
Comments
CHill60 26-Apr-22 9:10am
The "What I have tried:" section is where you are meant to put the Python code that you have already tried to write. You should also include the expected results from the sample data provided and clarify what you mean by "match up with each other item" - i.e. match on what?
Christopher Bogart 26-Apr-22 9:13am
I have not tried python code because I'm not really sure where to begin. I'm trying to figure out an output such as "item 123 & item 789 match over 90% of all sales orders for item 123."
CHill60 26-Apr-22 9:32am
Why figure out the output if there is an existing VBA program that does what you need? But again - what does "match over 90% of all sales orders" actually mean? What matches what?
Christopher Bogart 26-Apr-22 9:54am
The VBA code does not work unfortunately. Match means they exist on the same Sales Order; sold together often. Example data would be Item 123 & Item 456 both have one instance of Sales Order ABC. That's 1 "match". The goal is to compile this over all combinations of items & sales orders, then filter out the highest frequency pairs.

## Solution 2

This is the approach I would probably take
1. identify all of the pairs of items that are possible - have a look at itertools.Combinations[^] - use r = 2. E.g.
```123|456
123|789
456|789
456|123
789|123
789|456```
2. Generate pivoted data from the CSV to get [Sales Order No.] [list of items] e.g.
```ABCD 123|789|
EFGH 456|```
3. Compare the two lists to get the counts of matches e.g. see The Best Ways to Compare Two Lists in Python[^] - although to be honest by this stage I would probably be harnessing a database or Power Query

Edit: My step 3 is too simplistic - I'd probably do a count where Part 1 of my combo is "somewhere" in the list of items per order AND Part 2 of my combo is "somewhere" in the list of items per order, and therefore would not need the reversed combinations in the initial list i.e. would just need
```123|456
123|789
456|789```

v2

## Solution 1

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Top Experts
Last 24hrsThis month
 OriginalGriff 95 Richard Deeming 60 Pete O'Hanlon 55 Dave Kreskowiak 40 Peter Klemm 2024 34
 Pete O'Hanlon 1,805 OriginalGriff 1,602 Richard Deeming 610 Dave Kreskowiak 533 Richard MacCutchan 355

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900