|
Yes, we have just recently adopted a data masking policy for client personal data when copying Production data to Test - their names & addresses & tax identifiers are changed to random values. However in this recent case, that policy was not followed. I thinking mainly of our legal duty of care to provide accurate data for the Tax Office but you're right - we have violated our own data masking policy as well. Not sure if data masking is enforced by law here, but it is company policy. We do have privacy laws, so we can't make our clients' data public, but the law mainly covers movement of data in and out of the organisation.
|
|
|
|
|
johnsyd wrote: provide accurate data for the Tax Office Data for the Tax Office are not "live data". When you do your VAT report, you report past data: e.g. on March 10 you send a report for February. Hence that won't be a problem.
|
|
|
|
|
Data for tax *is* a problem in the context we are in, despite the fact that it is past data. The mirrored reporting database is an excellent idea. It can be protected from alteration. However we don't have it and unlikely to get it any time soon.
We downloaded production data into a testing database. The data (including past data) is unprotected. It could be changed by anyone in between the download and running the report. So could the code to generate the report.
Management told me there was nothing wrong with that particular approach - my posting was intended to see what the members here thought about it.
|
|
|
|
|
Is this a temporary scenario or business as usual?
If it's BAU that's bad; they should look to improve things so that they're in a better scenario long term; perhaps a simple compromise would be creating a staging database which they can use for these tasks (i.e. created with this purpose in mind, so it's not going to be affected by test activities, but gives flexibility of deployments alongside protecting production).
If it's a one off because of some reason then go with what's pragmatic; these things are OK as long as everyone understands it's a bad thing and accepts that it's "just this once" in an emergency situation... so long as you don't find you're always in an emergency situation...
|
|
|
|
|
It was intended to be BAU, but I said that I would go elsewhere if it continued. So it has stopped. Our CIO maintains that change is the enemy, so releases to Production are now only once a month. To create a staging database would probably be a 6 to 12 month project with a budget of at least %500,000 if it were approved at all. To open a port on a firewall takes 6 weeks. To increase the size of a database by 150GB takes from 4 to 8 weeks. To develop a system to FTP files from an external server takes 6 months. Yes, I know anyone could knock up a script in half an hour! This is what happens when an organisation becomes terrified of change.
|
|
|
|
|
Identical databases aren't. Running reports from a Test environment for a specific purpose is OK as long as all parties are aware of what's been done and what are the potential limitations of the reporting process. The phrase "informed consent" comes to mind. Sometimes close enough is close enough.
|
|
|
|
|
This is not best practice.
By definition, there is no telling what state the test system would be in. For example, despite taking a copy of the live database, how can they be sure all the report application code, dlls, configuration etc is as per live too? The report could be erroneous and they wouldn't even know it!
If updating the Production system is too risky then there is probably an underlying problem with architecture, configuration management or quality control.
You could suggest that using a test system to produce 'live' reports is inappropriate and inefficient and should consider a 'live' reporting server.
The reporting server would either have a copy of the production database, periodically sync'd in some fashion or use the production db remotely. The report server would house only the application code for the reports and as such you could argue that updating that server is a low risk to the Production Server's operation and a relaxed set of release procedures can be employed.
|
|
|
|
|
This is "reporting"; not "file maintenance".
Instead of catastrophizing, you should "prove" the two reports will vary.
Perhaps the managers actually know what's going on here (for once).
In any event, I'm all for a distinction between "operational" (OLTP) systems versus "informational" (DW) systems (which one tears down and builds up all the time).
|
|
|
|
|
As a thumb rule
- Reporting should always be from replica/mirror/snapshot of the production DB.
- Code that generate report has to be exactly same as in production.
- Reporting Environment should pass synthetic tests all the time to go ahead and generate report.
- Report generation should be autonomous without user intervention
If you meet these 4 conditions, its okay if you want to name the environment "Test" or "Report" anything you like.
In your case Live reporting from test code and test database might/might not alter quality of reports as it is not clear what business logic goes into generating that report and what changes are there in test code which are not there in production.
But, yeah, you should avoid this practice.
tf
|
|
|
|
|
No this is not a good practice because you can copy only database but the actual environment like IIS configuration and firewall settings you can not copy.
|
|
|
|
|
Not a good practice.
Yet where I am now, we're doing something potentially worse. Because (as in your case) deployment of code changes takes a while, and the report is "Very Important! It goes all the way up to the CIO!", the report developers argued that in order to provide the most accurate numbers, it has to be run against the most up-to-date code -- and that means running the production report on the DEV database* . So while I'm trying to improve the quality of the data I provide to the reporting team they keep complaining that the data is changing while they try to run the report -- of course it's changing! It's DEV!
So they want to run in DEV so they can receive changes in the data quickly, but they don't want the data to change. I'm reminded of the line from "The Twelve Chairs" -- "Hurry home, but don't gallop."
To make matters worse; I suspect that the reporting team is now referring to DEV as PROD!
* This is basically just a warehouse of data that has been ETLed from other databases within the enterprise.
<kvetch>
Oh, and I forgot the latest wrinkle -- my ETLs fill staging, then someone else reads staging to populate a Data Vault, then someone else reads that to populate a cube -- and lastly a report is run. And now the reporting team wants me to tell them every change I make that will impact their report -- I have no idea what their report entails, I have no idea what parts of the cube are involved, I have no idea what parts of the Data Vault the cube uses, I have no idea what parts of staging go where in the Data Vault, yet I'm supposed to know exactly how this data is used three levels downstream of me?! I don't even know what most of the data in staging means; I just copy it from other places. And, get this, they want me to tell them this so they don't have to waste their time investigating fluctuations they spot in the report.
I'm so glad I'm on vacation this week.
</kvetch>
|
|
|
|
|
Not a good sign that the website says:
Deprecated: mysql_connect(): The mysql extension is deprecated and will be removed in the future: use mysqli or PDO instead in /home/blahblahblahdotphp on line 19
I won't mention the name, to protect somebody.
|
|
|
|
|
It's always fun when a website returns an error and lets you know exactly which technology they used to implement it. Happened at our college(in the early days) and some students took advantage of that.
|
|
|
|
|
Happened to stumble upon a similar message yesterday:
Deprecated: Function set_magic_quotes_runtime() is deprecated in /www/htdocs/w0076eaa/(redacted)/counter/counter.php on line 61
chCounter: MySQL error!
Error number: 1045
Access denied for user 'd004ed82'@'localhost' (using password: YES)
Script stopped.
|
|
|
|
|
i'm confused about the password being YES or if it says that the user used a password
|
|
|
|
|
But he won't catch me so easily.
I've passed along links to Parsing Html The Cthulhu Way[^] many times so I always have the issue in mind. I usually read HTML with an XmlDocument (when I can) or the WinForms WebBrowser control, and I've seen others recommending the HTML Agility Pack.
This week I received a bunch of large HTML files to scrape.
They're not well-formed XML -- no surprise there.
So I decided that this would be a good opportunity to try the HTML Agility Pack.
It was able to read a sample, but it complained about “Start tag <td> was not found” -- which was surprising.
The problem? Several elements like this:
<th style="width: 5%"><!--</td>
The WinForms WebBrowser control is also able to read it, but the two tools treat it slightly differently and my initial feeling is that the WebBrowser handles it a little better.
So, the next time you encounter a developer who insists on consuming HTML with RegEx, pass them a sample like that, sit back, and watch the fun.
|
|
|
|
|
My favorite is AngleSharp[^]
What do you get when you cross a joke with a rhetorical question?
The metaphorical solid rear-end expulsions have impacted the metaphorical motorized bladed rotating air movement mechanism.
Do questions with multiple question marks annoy you???
|
|
|
|
|
We're moving off the AgilityPack onto AngleSharp.
cheers
Chris Maunder
|
|
|
|
|
AngleSharp is easily one of the best parsers out there.
And it seems Firefox doesn't think parsers is a word and wants it to be passer or parers.
What do you get when you cross a joke with a rhetorical question?
The metaphorical solid rear-end expulsions have impacted the metaphorical motorized bladed rotating air movement mechanism.
Do questions with multiple question marks annoy you???
|
|
|
|
|
I'm beginning to think that the HtmlAgilityPack uses RegularExpressions.
I'll have to try AngleSharp. Oh, look, an article...
|
|
|
|
|
A quick look at the HAP source code and it seems they parse it character by character.
I guess that's why it was so slow (it spent over three minutes 'parsing') when I tested it on a 1298 line HTML file (I can't remember where I found that file).
AngleSharp parsed the same file much faster (in a few seconds).
What do you get when you cross a joke with a rhetorical question?
The metaphorical solid rear-end expulsions have impacted the metaphorical motorized bladed rotating air movement mechanism.
Do questions with multiple question marks annoy you???
|
|
|
|
|
One of my earliest gigs was writing and XML, and then HTML, parser.
I learned why browsers treat HTML so differently, but never learned why browser writers were so pig-headed in their insistence on sticking to clearly ludicrous decisions when ambiguity in the "spec" surfaced. As it did often back then.
So everytime I see a HTML parser I give a solemn nod to the author. And then wish them the speediest exit possible from that gig.
cheers
Chris Maunder
|
|
|
|
|
Somehow, I immediately thought of this when I saw the title of your post.
Enjoy[^]
|
|
|
|
|
Thanks for the listen man!
I now want to kick ass on this morning.
|
|
|
|
|
PIEBALDconsult wrote: the next time you encounter a developer who insists on consuming HTML with RegEx, pass them a sample like that, sit back, and watch the fun
You're a cruel, cruel man.
I like it.
Bad command or file name. Bad, bad command! Sit! Stay! Staaaay...
|
|
|
|