Very Large aow_processed and aow_processed_aow_actions tables.

Very Large aow_processed and aow_processed_aow_actions tables.

I was making another dev copy of my crm to test the new 7.7 upgrade. So as I went to copy the DB I was prepared to move about 17GB of data. Then I started thinking, why don’t I just export with --no-data for mysqldump. But I still need info like the users, roles, etc, to test. Then I though “well then I’ll simply do a separate mysql dump of the tables I need data in”.

As I go to browse, curiously, I thought, hmm let me sort by largest to the smallest table by size and see what the culprit is. Sure enough aow_processed and aow_processed_aow_actions where, in total, about ~15gb.

I never noticed this whole time that the prune DB scheduler was never running. On top of that, I have a master slave db setup as backup and performance. So I wanted to include how I manually pruned the DB with those 2 tables. I cannot use Truncate as it will not be replicated on my slave.

First I will CREATE a copy of the table, and keep the data for the last day. I went to cli and logged into mysql


mysql -u root -p

Then I needed to select my database. Make a copy of the aow_processed table, keeping just today’s data, and then renaming the tables.


USE mycrmdb;
CREATE TABLE aow_processed_new AS SELECT * FROM aow_processed WHERE date_entered >= '2016-09-20 00:00:00';
ALTER TABLE aow_processed RENAME aow_processed_old;
ALTER TABLE aow_processed_new RENAME aow_processed;

Then I simply need to DROP the old aow_processed DB. NOW please do not do this if you do not know what you are doing. Do not go dropping tables by accident without backups and killing your data. OK, warning over.


DROP TABLE aow_processed_old;

That is the quickest method of pruning that TABLE. I had about 50 million records in aow_processed and it took about 10GB. Yikes.
Repeat for aow_processed_aow_actions table and you are all set. No more enormous DB.

4 Likes

for aow_processed_aow_actions table use date_modified instead of date_entered

thank you.

my table had over 14million records!

Hi everyone.

Mine had 9 million in just 2 months or so… I’ve since learned to use scheduled workflows sparingly :slight_smile:

Just an addition to make here. When you use the method proposed by angrymob you lose the indexes when copying the tables. After some research I’d suggest these queries for copying the tables instead: (see here: http://stackoverflow.com/a/3280042)

USE mycrmdb;
CREATE TABLE aow_processed_new LIKE aow_processed;
INSERT aow_processed_new SELECT * FROM aow_processed WHERE date_entered >= '2017-03-27 12:00:00';

USE mycrmdb;
CREATE TABLE aow_processed_aow_actions_new LIKE aow_processed_aow_actions;
INSERT aow_processed_aow_actions_new SELECT * FROM aow_processed_aow_actions WHERE date_modified >= '2017-03-27 12:00:00';

ALTER TABLE aow_processed RENAME aow_processed_old;
ALTER TABLE aow_processed_new RENAME aow_processed;
ALTER TABLE aow_processed_aow_actions RENAME aow_processed_aow_actions_old;
ALTER TABLE aow_processed_aow_actions_new RENAME aow_processed_aow_actions;

DROP TABLE aow_processed_old;
DROP TABLE aow_processed_aow_actions_old;

Also I’ve added one index each for the two date fields used for pruning process (assuming I’ll be doing this regularly in the future). i.e. date_entered for aow_processed, date_modified for aow_processed_aow_actions…

If anybody knows off-hand how to integrate this code as a pruning script into scheduled tasks (to let it run monthly or whatever), let me know. That would be the next logical step imho.

Looks like you have to do a REPAIR TABLE after copying the table to refresh the indexes. So here’s the complete set of queries again:

USE mycrmdb;

CREATE TABLE aow_processed_new LIKE aow_processed;
INSERT aow_processed_new SELECT * FROM aow_processed WHERE date_entered >= '2017-03-27 12:00:00';
REPAIR TABLE aow_processed_new; 

CREATE TABLE aow_processed_aow_actions_new LIKE aow_processed_aow_actions;
INSERT aow_processed_aow_actions_new SELECT * FROM aow_processed_aow_actions WHERE date_modified >= '2017-03-27 12:00:00';
REPAIR TABLE aow_processed_aow_actions_new;

ALTER TABLE aow_processed RENAME aow_processed_old;
ALTER TABLE aow_processed_new RENAME aow_processed;
ALTER TABLE aow_processed_aow_actions RENAME aow_processed_aow_actions_old;
ALTER TABLE aow_processed_aow_actions_new RENAME aow_processed_aow_actions;

DROP TABLE aow_processed_old;
DROP TABLE aow_processed_aow_actions_old;
1 Like

Does anyone know if the Prune Database job is trying to handle this or not?

What I mean is, are all of you people with this problem having errors in the Prune job, or for some of you it’s running, but not doing anything to these tables?

It would also be nice to know why the Prune job is failing, if it’s failing.

I had a short look at the pruning script, seemed to me like all it does is delete rows marked as “deleted”.

I also let my pruning script run over the weekend. It worked fine (did a few random checks and could confirm that “deleted”-marked records were deleted from the tables. => no effect on aow_processed tables.

Just for the sake of completeness: I created an issue (suggestion) for this on github.
https://github.com/salesagility/SuiteCRM/issues/3328

It has now been closed and moved to Trello. But I wanted to add this after Matt explained the relevance of the aow_processed records. Namely: Workflows that are supposed to only run once rely on these aos_processed records to decide whether to fire or not. If you delete the records some Workflows may fire again, that are set up to only fire once per record.

One (imperfect) approach would be to change my query (from older posts here) to only delete records that are related to workflows that are set up to run repeatedly. That way, the history would be preserved for the WF that are currently set up to only fire once.

Or only only delete aow_processed records for WF that are setup to run via scheduler. That would get rid of the main portion of the bloat. In my case probably 99.99% of the records, without destroying any needed WF log.

1 Like

Hello,

I’m having same problems, there is approx 71.Mil records only in aow_processed folder, 20+GB in size. Workflows that I have created are running on modified records and only when somebody hits save button in CRM.

I was wondering how can I purge or trim data in this table and create scheduler automatically to do so and how safe is this against the old data that is already processed. Would I have double records or perhaps repeated entries for old records after doing that.

I’m running 7.8.2 with MySQL.

Best Regards,

Slaven.

You can check this issue where Matt Lorimer comments. He developed this, if I’m not mistaken.

Thank you pgr,

I have read that, but I’m looking for the way on how to create that scheduled task under Admin-Schuduler in CRM. What would be the steps to create custom scheduled task. Should I copy mysql commands in a text file somewhere in CRM and then create a scheduler for that? I never done that before with CRM.

Another problem is that I have created custom workflow that converts Accounts to Contacts with all of it’s records, and this workflow is indeed set to transfer Account to Contact but only once per record. So I would probably have to have a script that rules out all of the workflows that are triggered to run only once per record.

Best Regards,

Slaven.

I don’t have a clear idea of what the “clearing up” steps should be, because of that warning Matt gives about old items and unpredictable results.

That’s a previous problem, if you don’t know how to delete safely, what exactly are you going to put into a Scheduler Job?

Another preliminary concern (and another thing I don’t fully understand) is what exactly are the inappropriate uses of “repeated runs”? It seems if this is incorrectly used, the huge database sizes are aggravated…

But if you have all that solved, and you know your configuration is ok, and know how to purge the database safely, then all you have to do is follow some tutorial to create a custom Scheduler Job.

Possibly the best thing to do, if you’re going to spend time developing for this, would be to change the code in SuiteCRM to increment a count on the database, instead of repeating the row, like Matt suggests…

Thanks pgr, for guide on how to create a custom scheduler task. Yes, counting only works for repeated runs, not for single one. So I guess only thing that you can do safely is to delete records for the clients that are deleted or for the no longer active or deleted workflow processes.

Best Regards,

Slaven.

Ok, that sounds reasonable. When you’re done, please come back here and post some of the queries you developed for the deletions, and tell us how much data you were able to trim. I think this will be useful for more people. Thanks.

Hey all.

Here’s my updated pruning queries taking into consideration, that the log entries should never be deleted for workflows that are supposed to only run once.

 USE crm;
CREATE TABLE aow_processed_new LIKE aow_processed;
# retain all aow_processed rows that belong to WF that run only once per record
INSERT aow_processed_new SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 0;
# prune aow_processed rows that belong to WF that run multiple times, based on date
INSERT aow_processed_new SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 1 AND wp.date_entered >= '2017-10-15 00:00:00';
REPAIR TABLE aow_processed_new; 


CREATE TABLE aow_processed_aow_actions_new LIKE aow_processed_aow_actions;
# retain all aow_processed_aow_actions rows that belong to WF that run only once per record
INSERT aow_processed_aow_actions_new SELECT wpa.* FROM aow_processed_aow_actions AS wpa JOIN (SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 0) as wp ON wpa.aow_processed_id = wp.id;
# prune aow_processed_aow_actions rows that belong to WF that run multiple times, based on date
INSERT aow_processed_aow_actions_new SELECT wpa.* FROM aow_processed_aow_actions AS wpa JOIN (SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 1) as wp ON wpa.aow_processed_id = wp.id AND wpa.date_modified >= '2017-10-15 00:00:00';
REPAIR TABLE aow_processed_aow_actions_new;

ALTER TABLE aow_processed RENAME aow_processed_old;
ALTER TABLE aow_processed_new RENAME aow_processed;
ALTER TABLE aow_processed_aow_actions RENAME aow_processed_aow_actions_old;
ALTER TABLE aow_processed_aow_actions_new RENAME aow_processed_aow_actions;

DROP TABLE aow_processed_old;
DROP TABLE aow_processed_aow_actions_old;

This does not prune the aow_log for Workflows that have been deleted. But that is not necessary for my use case. Deleting old logs by date takes care of that for me…

3 Likes

Hi John,

thanks for these scripts!

Maybe it would be a good idea to post them also on that github issue:

https://github.com/salesagility/SuiteCRM/issues/3328

I have some hope that they might be integrated in the “Prune database” Scheduler job…

EDIT: maybe that will help get the Issue reopened and labeled as a bug :slight_smile:

John you never got around to doing what I asked, but since this issue keeps coming up, I encourage you to post your SQL up on the GitHub Issue, and I’ll try to get it reopened. Thanks!

done

1 Like

Thanks a lot, let’s hope that can be added soon!

BTW, I recently wrote some Documentation for Workflows. Specifically, the beginning of this page

https://docs.suitecrm.com/user/advanced-modules/workflow/

where it explains the very important fields “Run”, “Run on” and “Repeated Runs”, which didn’t have any proper explanations yet.

If you can take a look tell me if it makes sense to you, and if you feel there is something that could be added for clarity or completion. Thanks!

Hey pgr. Looks like real progress! thanks.

Typo at Run “On Scheduler”:
“resticted by the Run on property”

Question:
In the “Run On” section you write:

How exactly does this behave when the workflow runs only “on scheduler”? I had always assumed that these two options only apply to “on save”.