How often do 2 items exist on the same sales order?

Question

0.00/5 (No votes)

See more:

New python user here, looking to learn more about Pandas.
I have an output as a .csv file from our ERP containing the following relevant info:

Item No. --- Sales Order No. --- Quantity
123 ABCD 2
456 EFGH 5
789 ABCD 8

What I'd like to do is determine on what percentage of sales orders does each item match up with each other item, filter out everything below X%, and output the high match pairs. The goal is to provide Sales with guidance regarding potential sales packages to boost production efficiency and reduce inventory costs.

What I have tried:

We made an excel VBA program but it takes too long and tends to crash the system.

Posted 26-Apr-22 3:04am

Christopher Bogart

Updated 26-Apr-22 6:00am

Add a Solution

Comments

CHill60 26-Apr-22 9:10am

The "What I have tried:" section is where you are meant to put the Python code that you have already tried to write. You should also include the expected results from the sample data provided and clarify what you mean by "match up with each other item" - i.e. match on what?

Christopher Bogart 26-Apr-22 9:13am

I have not tried python code because I'm not really sure where to begin. I'm trying to figure out an output such as "item 123 & item 789 match over 90% of all sales orders for item 123."

CHill60 26-Apr-22 9:32am

Why figure out the output if there is an existing VBA program that does what you need? But again - what does "match over 90% of all sales orders" actually mean? What matches what?

Christopher Bogart 26-Apr-22 9:54am

The VBA code does not work unfortunately. Match means they exist on the same Sales Order; sold together often. Example data would be Item 123 & Item 456 both have one instance of Sales Order ABC. That's 1 "match". The goal is to compile this over all combinations of items & sales orders, then filter out the highest frequency pairs.

2 solutions

Solution 1

See pandas - Python Data Analysis Library[^] and maybe The Python Tutorial — Python 3.10.4 documentation[^].

Posted 26-Apr-22 3:26am

Richard MacCutchan

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CHill60 · Accepted Answer · 2022-04-26T06:00:00

This is the approach I would probably take
1. identify all of the pairs of items that are possible - have a look at itertools.Combinations[^] - use r = 2. E.g.

2. Generate pivoted data from the CSV to get [Sales Order No.] [list of items] e.g.

ABCD 123|789|
EFGH 456|

3. Compare the two lists to get the counts of matches e.g. see The Best Ways to Compare Two Lists in Python[^] - although to be honest by this stage I would probably be harnessing a database or Power Query

Edit: My step 3 is too simplistic - I'd probably do a count where Part 1 of my combo is "somewhere" in the list of items per order AND Part 2 of my combo is "somewhere" in the list of items per order, and therefore would not need the reversed combinations in the initial list i.e. would just need

123|456
123|789
456|789