Oracle clusterware 11gR2 & mysql 5.1 & ACFS

di Matteo Durighetto - 11 maggio 2010

I’m very exited to annuced that I have tested Mysql 5.1.47 clustered as a service/application on Oracle clusterware 11.2.0.1 using the new ACFS filesystem as a shared filesystem between 2 nodes. The trick is to create two virtual resources mysql.rg / mysql.res   to enclose the dependencies on all resource like this  ( → means depend on ) :

mysql.rg  → mysql.db → acfs → vip → mysql.res

It works very well, and the performance are good on a SAN, and the switchig is very fast!

Raccomended

7 Commenti a “Oracle clusterware 11gR2 & mysql 5.1 & ACFS”

  1. casaprocida scrive:

    Matteo,
    i’m trying to cluster some third party application on Oracle Clusterware 11.2.0.1, but it doesn’t work. The VIP resource working well but when I try to start the resource (the third party application)I receive the result

    [root@jb00 bin]# ./crsctl start resource apptest
    CRS-2672: Attempting to start ‘apptest’ on ‘jb00′
    CRS-2674: Start of ‘apptest’ on ‘jb00′ failed
    CRS-2679: Attempting to clean ‘apptest’ on ‘jb00′
    CRS-2678: ‘apptest’ on ‘jb00′ has experienced an unrecoverable failure
    CRS-0267: Human intervention required to resume its availability.
    CRS-4000: Command Start failed, or completed with errors.

    Reading the crsd.log it seams Cluisterware do not acces correctly to the ActionScript. I have the same error in the AIX 6.1 system (the production environment) and in the Red Hat 5.3 system (the lab environment)
    Could you please help me sending the configuration for the resource, i can find where is the error in my configuration.
    Thanks in advance.
    Marco

  2. Mat scrive:

    Hello,
    you have to first prepare the actions script in perl ;) and then create the profile with dependecies :)

    Could you print your perl scripts? I have no problem with apache mysql 5.1.47 & postgresql 8.4.4 ;)

    Mat

  3. casaprocida scrive:

    Hello Mat,
    thanks for your help.
    Find the shell i use for the action script,
    [root@jb00 TEST]# cat test.scr
    echo “———————————————-” 1>> /tmp/TEST/test.log
    date 1>> /tmp/TEST/test.log
    echo “Entro nello script” 1>> /tmp/TEST/test.log
    case $1 in
    ‘start’)
    echo “sono qui prima dopo start” 1>> /tmp/TEST/test.log
    exit 0
    ;;
    ‘stop’)
    echo “sono qui prima dopo stop ” 1>> /tmp/TEST/test.log
    exit 0
    ;;
    ‘check’)
    echo “sono qui prima dopo checkk” 1>> /tmp/TEST/test.log
    exit 0
    ;;
    ‘clean’)
    echo “sono qui prima dopo clean” 1>> /tmp/TEST/test.log
    exit 0
    ;;
    *)
    echo “il parametro “$1″ non è ammesso” 1>> /tmp/TEST/test.log
    ;;
    esac
    [root@jb00 TEST]# ls -l
    total 12
    -rwxrwxrwx 1 root root 557 Jun 21 00:29 test.scr

    If i try to strart the resource
    [root@jb00 bin]# ./crsctl start resource apptest
    CRS-2679: Attempting to clean ‘apptest’ on ‘jb00′
    CRS-2680: Clean of ‘apptest’ on ‘jb00′ failed
    CRS-4000: Command Start failed, or completed with errors.
    [root@jb00 bin]#

    I have this in the crsd.log
    2010-06-24 11:20:06.116: [UiServer][2572053392] Container [ Name: ORDER
    MESSAGE:
    TextMessage[CRS-2679: Attempting to clean 'apptest' on 'jb00']
    MSGTYPE:
    TextMessage[3]
    OBJID:
    TextMessage[apptest]
    WAIT:
    TextMessage[0]
    ]
    2010-06-24 11:20:06.121: [ AGFW][2586762128] Agfw Proxy Server received the message: RESOURCE_CLEAN[apptest 1 1] ID 4100:2699
    2010-06-24 11:20:06.121: [ AGFW][2586762128] Starting the agent: /u01/app/grid/bin/scriptagent with user id: root and incarnation:3
    2010-06-24 11:20:06.399: [ AGFW][2586762128] Starting the HB [Interval = 30000, misscount = 6kill allowed=1] for agent: /u01/app/grid/bin/scriptagent_root
    2010-06-24 11:20:06.400: [ AGFW][2586762128] Could not forward message [RESOURCE_CLEAN[apptest 1 1] ID 4100:2699] to agent. /u01/app/grid/bin/scriptagent_root is not running
    2010-06-24 11:20:06.401: [ AGFW][2586762128] Starting of the agent: /u01/app/grid/bin/scriptagent with user id root is already in progress.
    2010-06-24 11:20:06.816: [CLSFRAME][2593065872] New IPC Member:{Relative|Node:0|Process:6|Type:3}:AGENT
    2010-06-24 11:20:06.816: [CLSFRAME][2593065872] New process connected to us ID:{Relative|Node:0|Process:6|Type:3} Info:AGENT
    2010-06-24 11:20:06.853: [ AGFW][2586762128] Agfw Proxy Server received the message: AGENT_HANDSHAKE[Proxy] ID 20484:14
    2010-06-24 11:20:06.853: [ AGFW][2586762128] Agent /u01/app/grid/bin/scriptagent_root with pid:11356 connected to server.
    2010-06-24 11:20:06.854: [ AGFW][2586762128] Agfw Proxy Server sending message: RESTYPE_ADD[cluster_resource] ID 8196:2715 to the agent /u01/app/grid/bin/scriptagent_root
    2010-06-24 11:20:06.889: [ AGFW][2586762128] Agfw Proxy Server sending message: RESTYPE_ADD[local_resource] ID 8196:2717 to the agent /u01/app/grid/bin/scriptagent_root
    2010-06-24 11:20:06.904: [ AGFW][2586762128] Agfw Proxy Server sending message: RESTYPE_ADD[ora.cluster_resource.type] ID 8196:2719 to the agent /u01/app/grid/bin/scriptagent_root
    2010-06-24 11:20:06.914: [ AGFW][2586762128] Agfw Proxy Server sending message: RESTYPE_ADD[ora.local_resource.type] ID 8196:2721 to the agent /u01/app/grid/bin/scriptagent_root
    2010-06-24 11:20:06.920: [ AGFW][2586762128] Agfw Proxy Server sending message: RESTYPE_ADD[ora.oc4j.type] ID 8196:2723 to the agent /u01/app/grid/bin/scriptagent_root
    2010-06-24 11:20:06.928: [ AGFW][2586762128] Agfw Proxy Server sending message: RESOURCE_ADD[apptest 1 1] ID 4356:2725 to the agent /u01/app/grid/bin/scriptagent_root
    2010-06-24 11:20:06.931: [ AGFW][2586762128] Agfw Proxy Server forwarding the message: RESOURCE_CLEAN[apptest 1 1] ID 4100:2699 to the agent /u01/app/grid/bin/scriptagent_root
    2010-06-24 11:20:06.934: [ AGFW][2586762128] Agfw Proxy Server replying to the message: AGENT_HANDSHAKE[Proxy] ID 20484:14
    2010-06-24 11:20:07.074: [ AGFW][2586762128] Received the reply to the message: RESTYPE_ADD[cluster_resource] ID 8196:2715 from the agent /u01/app/grid/bin/scriptagent_root
    2010-06-24 11:20:07.075: [ AGFW][2586762128] Received the reply to the message: RESTYPE_ADD[local_resource] ID 8196:2717 from the agent /u01/app/grid/bin/scriptagent_root
    2010-06-24 11:20:07.076: [ AGFW][2586762128] Received the reply to the message: RESTYPE_ADD[ora.cluster_resource.type] ID 8196:2719 from the agent /u01/app/grid/bin/scriptagent_root
    2010-06-24 11:20:07.077: [ AGFW][2586762128] Received the reply to the message: RESTYPE_ADD[ora.local_resource.type] ID 8196:2721 from the agent /u01/app/grid/bin/scriptagent_root
    2010-06-24 11:20:07.078: [ AGFW][2586762128] Received the reply to the message: RESTYPE_ADD[ora.oc4j.type] ID 8196:2723 from the agent /u01/app/grid/bin/scriptagent_root
    2010-06-24 11:20:07.079: [ AGFW][2586762128] Received the reply to the message: RESOURCE_ADD[apptest 1 1] ID 4356:2725 from the agent /u01/app/grid/bin/scriptagent_root
    2010-06-24 11:20:07.082: [ AGFW][2586762128] Received the reply to the message: RESOURCE_CLEAN[apptest 1 1] ID 4100:2726 from the agent /u01/app/grid/bin/scriptagent_root
    2010-06-24 11:20:07.082: [ AGFW][2586762128] Agfw Proxy Server sending the reply to PE for message:RESOURCE_CLEAN[apptest 1 1] ID 4100:2699
    2010-06-24 11:20:07.082: [ CRSPE][2576255888] Received reply to action [Clean] message ID: 2699
    2010-06-24 11:20:07.082: [ CRSPE][2576255888] Clean action failed with error code: 0
    2010-06-24 11:20:07.083: [ CRSRPT][2574154640] Publishing event: Cluster Resource Action Failed Event : 0xa210aa70
    2010-06-24 11:20:07.083: [ CRSRPT][2574154640] Publish to eons buffered event : 0xa210aa70
    2010-06-24 11:20:07.163: [ AGFW][2586762128] Received the reply to the message: RESOURCE_CLEAN[apptest 1 1] ID 4100:2726 from the agent /u01/app/grid/bin/scriptagent_root
    2010-06-24 11:20:07.164: [ AGFW][2586762128] Agfw Proxy Server sending the last reply to PE for message:RESOURCE_CLEAN[apptest 1 1] ID 4100:2699
    2010-06-24 11:20:07.164: [ CRSPE][2576255888] Received reply to action [Clean] message ID: 2699
    2010-06-24 11:20:07.165: [ CRSPE][2576255888] CRS-2680: Clean of ‘apptest’ on ‘jb00′ failed

    2010-06-24 11:20:07.166: [ CRSPE][2576255888] Sequencer for [apptest 1 1] has completed with error: CRS-0216: Could not stop resource ‘apptest’.

    2010-06-24 11:20:07.168: [ CRSPE][2576255888] PE Command [ Start Resource : 0xc4568d0 ] has completed
    2010-06-24 11:20:07.168: [ CRSPE][2576255888] UI Command [Start Resource : 0xc4568d0] is replying to sender.
    2010-06-24 11:20:07.170: [UiServer][2572053392] Container [ Name: ORDER
    MESSAGE:
    TextMessage[CRS-2680: Clean of 'apptest' on 'jb00' failed]
    MSGTYPE:
    TextMessage[1]
    OBJID:
    TextMessage[apptest]
    WAIT:
    TextMessage[0]
    ]
    2010-06-24 11:20:07.171: [UiServer][2572053392] Container [ Name: UI_DATA
    apptest:
    TextMessage[215]
    ]
    2010-06-24 11:20:07.171: [UiServer][2572053392] Done for ctx=0xa2103450
    2010-06-24 11:20:07.171: [ AGFW][2586762128] Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:2746
    2010-06-24 11:20:07.171: [ AGFW][2586762128] Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:2746
    2010-06-24 11:20:07.205: [ AGFW][2586762128] Agfw Proxy Server received the message: AGENT_SUICIDE[Proxy] ID 20486:42
    2010-06-24 11:20:07.206: [ AGFW][2586762128] Suicide request received from /u01/app/grid/bin/scriptagent_root
    2010-06-24 11:20:07.206: [ AGFW][2586762128] Agfw Proxy Server replying to the message: AGENT_SUICIDE[Proxy] ID 20486:42
    2010-06-24 11:20:07.220: [ CRSCOMM][2593065872][FFAIL] Couldnt clscreceive message, no message: 11
    2010-06-24 11:20:07.220: [ CRSCOMM][2593065872] Client disconnected.
    2010-06-24 11:20:07.221: [ CRSCOMM][2593065872][FFAIL] Listener got clsc error 11 for memNum. 6
    2010-06-24 11:20:07.221: [ CRSCOMM][2593065872] IPC listener connection to member 6 has been removed
    2010-06-24 11:20:07.221: [CLSFRAME][2593065872] Removing IPC Member:{Relative|Node:0|Process:6|Type:3}
    2010-06-24 11:20:07.221: [CLSFRAME][2593065872] Disconnected from AGENT process: {Relative|Node:0|Process:6|Type:3}
    2010-06-24 11:20:07.222: [ CRSPE][2576255888] Disconnected from server:
    2010-06-24 11:20:07.224: [ AGFW][2586762128] Agfw Proxy Server received process disconnected notification, count=1
    2010-06-24 11:20:07.224: [ AGFW][2586762128] /u01/app/grid/bin/scriptagent_root disconnected.
    2010-06-24 11:20:07.224: [ AGFW][2586762128] Agent /u01/app/grid/bin/scriptagent_root[11356] stopped!
    2010-06-24 11:20:07.224: [ CRSCOMM][2586762128] removeConnection: Member 6 does not exist.
    [root@jb00 crsd]#

    And this in the cluster_resource_root.log
    2010-06-24 11:20:06.981: [ AGFW][2810366864] AGFW assuming CLEAN entry point defined in script.
    2010-06-24 11:20:06.981: [ AGFW][2810366864] Added new restype: ora.local_resource.type
    2010-06-24 11:20:06.981: [ AGFW][2810366864] Agent sending last reply for: RESTYPE_ADD[ora.local_resource.type] ID 8196:2721
    2010-06-24 11:20:06.981: [ AGFW][2810366864] Agent received the message: RESTYPE_ADD[ora.oc4j.type] ID 8196:2723
    2010-06-24 11:20:06.982: [ AGFW][2810366864] Agent does not have the type: ora.oc4j.type
    2010-06-24 11:20:06.982: [ AGFW][2810366864] Agent do not have any action entries defined for type: ora.oc4j.type
    2010-06-24 11:20:06.982: [ AGFW][2810366864] Could not find the action entry: START
    2010-06-24 11:20:06.982: [ AGFW][2810366864] AGFW assuming START entry point defined in script.
    2010-06-24 11:20:06.982: [ AGFW][2810366864] Could not find the action entry: STOP
    2010-06-24 11:20:06.982: [ AGFW][2810366864] AGFW assuming STOP entry point defined in script.
    2010-06-24 11:20:06.982: [ AGFW][2810366864] Could not find the action entry: CHECK
    2010-06-24 11:20:06.982: [ AGFW][2810366864] AGFW assuming CHECK entry point defined in script.
    2010-06-24 11:20:06.982: [ AGFW][2810366864] Could not find the action entry: CLEAN
    2010-06-24 11:20:06.982: [ AGFW][2810366864] AGFW assuming CLEAN entry point defined in script.
    2010-06-24 11:20:06.982: [ AGFW][2810366864] Added new restype: ora.oc4j.type
    2010-06-24 11:20:06.982: [ AGFW][2810366864] Agent sending last reply for: RESTYPE_ADD[ora.oc4j.type] ID 8196:2723
    2010-06-24 11:20:06.983: [ AGFW][2810366864] Agent received the message: RESOURCE_ADD[apptest 1 1] ID 4356:2725
    2010-06-24 11:20:06.983: [ AGFW][2810366864] Added new resource: apptest 1 1 to the agfw
    2010-06-24 11:20:06.985: [ AGFW][2810366864] Agent sending last reply for: RESOURCE_ADD[apptest 1 1] ID 4356:2725
    2010-06-24 11:20:06.985: [ AGFW][2810366864] Agent received the message: RESOURCE_CLEAN[apptest 1 1] ID 4100:2726
    2010-06-24 11:20:06.985: [ AGFW][2810366864] Preparing CLEAN command for: apptest 1 1
    2010-06-24 11:20:06.985: [ AGFW][2810366864] apptest 1 1 state changed from: UNKNOWN to: CLEANING
    2010-06-24 11:20:07.017: [ AGFW][2877483920] Executing command: clean for resource: apptest 1 1
    2010-06-24 11:20:07.017: [ AGFW][2877483920] Entering script entry point…
    2010-06-24 11:20:07.017: [apptest][2877483920] [clean] Executing action script: /tmp/TEST/test.scr[clean]
    2010-06-24 11:20:07.079: [ AGFW][2877483920] Command: clean for resource: apptest 1 1 completed with invalid status: 209
    2010-06-24 11:20:07.080: [ AGFW][2810366864] Agent sending reply for: RESOURCE_CLEAN[apptest 1 1] ID 4100:2726
    2010-06-24 11:20:07.083: [ AGFW][2843925392] Executing command: check for resource: apptest 1 1
    2010-06-24 11:20:07.083: [ AGFW][2843925392] Entering script entry point…
    2010-06-24 11:20:07.083: [apptest][2843925392] [check] Executing action script: /tmp/TEST/test.scr[check]
    2010-06-24 11:20:07.091: [CRSTIMER][2743147408] Timer Thread Starting.
    2010-06-24 11:20:07.160: [ AGFW][2843925392] Received unknown resource status code: 209
    2010-06-24 11:20:07.160: [ AGFW][2843925392] check for resource: apptest 1 1 completed with status: UNKNOWN
    2010-06-24 11:20:07.160: [ AGFW][2810366864] apptest 1 1 state changed from: CLEANING to: UNKNOWN
    2010-06-24 11:20:07.161: [ AGFW][2810366864] Agent sending last reply for: RESOURCE_CLEAN[apptest 1 1] ID 4100:2726
    2010-06-24 11:20:07.161: [ AGFW][2810366864] Agent has no resources to be monitored.Sending suicide request.
    2010-06-24 11:20:07.161: [ AGFW][2810366864] Agent sending message to PE: AGENT_SUICIDE[Proxy] ID 20486:42
    2010-06-24 11:20:07.207: [ AGFW][2810366864] Agent is commiting suicide.
    2010-06-24 11:20:07.207: [ USRTHRD][2810366864] Script agent is exiting..

    2010-06-24 11:20:07.208: [ AGFW][2810366864] Agent is exiting with exit code: 1

    The configuration of the resource is the following

    [root@jb00 bin]# ./crsctl status resource apptest -f
    NAME=apptest
    TYPE=cluster_resource
    STATE=UNKNOWN
    TARGET=ONLINE
    ACL=owner:root:rwx,pgrp:root:rwx,other::rwx,user:oragrp:rwx,group:asmadmin:rwx,group:dba:rwx,group:asmdba:rwx
    ACTION_FAILURE_TEMPLATE=
    ACTION_SCRIPT=/tmp/TEST/test.scr
    ACTIVE_PLACEMENT=0
    AGENT_FILENAME=%CRS_HOME%/bin/scriptagent
    AUTO_START=restore
    CARDINALITY=1
    CARDINALITY_ID=0
    CHECK_INTERVAL=30
    CREATION_SEED=35
    CURRENT_RCOUNT=0
    DEFAULT_TEMPLATE=
    DEGREE=1
    DESCRIPTION=
    ENABLED=1
    FAILOVER_DELAY=0
    FAILURE_COUNT=0
    FAILURE_HISTORY=
    FAILURE_INTERVAL=0
    FAILURE_THRESHOLD=0
    HOSTING_MEMBERS=
    ID=apptest
    INCARNATION=0
    LAST_FAULT=0
    LAST_RESTART=0
    LAST_SERVER=
    LOAD=1
    LOGGING_LEVEL=5
    NOT_RESTARTING_TEMPLATE=
    OFFLINE_CHECK_INTERVAL=0
    PLACEMENT=restricted
    PROFILE_CHANGE_TEMPLATE=
    RESTART_ATTEMPTS=2
    SCRIPT_TIMEOUT=60
    SERVER_POOLS=apptest_sp
    START_DEPENDENCIES=hard(app.appvip)
    START_TIMEOUT=0
    STATE_CHANGE_TEMPLATE=
    STATE_CHANGE_VERS=0
    STATE_DETAILS=
    STOP_DEPENDENCIES=hard(app.appvip)
    STOP_TIMEOUT=0
    UPTIME_THRESHOLD=1h

    Sorry if i am verbose, but i hope giving you all the needed information .

    Thanks in advance for your attention

    Marco

  4. Mat scrive:

    Hi Marco,
    as soon as possible I will print a new article with the scripts and methods to correct do the 3-third part application (almost for mysql / postgresql & apache ) with perl script that I developed.
    Normally I not use shell script, I think you have to diagnose :

    1) permission of the script and the directory taht contains the script
    2) control if the script run correctly in manual mode ( for example I not see the #!/usr/bin/sh for identified the shell script)
    3) use “set -x” and redirection of the errors on the logs you used with ” 2>&1 ”

    The cluster seemed to correct try to inizialize the resource by the way the script returned an error and so the cluster tried to clean and check with the same script. Because the script failed also in the clean anc check the status becomed unknown.

    Kind Regards

    Mat
    · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
    · · · · · · · · · · · · · · · · · · ·

    M I D A T I – P L A Y T H E C H A N G E

    · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
    · · · · · · · · · · · · · · · · · · ·
    by Miriade Spa

  5. casaprocida scrive:

    Matt,
    thanks for your help, you confirm me it is a possible permission problem. It works fine form command line, the problem is when Clusterware try to rin it.
    I’m not able with perl, but i can try with “set -x” and changing permission and script location.

    Ciao Marco

  6. casaprocida scrive:

    Matt,
    it works now, I’m using Clusterware to manage CFT and Beta 48 Agent.
    Here my experience http://casaprocida.blogspot.com/
    Thanks for your time

  7. Pokerspiel scrive:

    Took me time to read the whole article, the article is great but the comments bring more brainstorm ideas, thanks.

    - Johnson

Lascia un Commento