Problem - kubeflow pipeline’s mysql is not working after docker restart of worker node
I removed nvidia-device-plugin daemonsets and installed gpu-operator , So I have to restart docker daemon to re-apply docker daemon.json
- I restarted docker on my kubeflow nodes and got
gcr.io/ml-pipeline/mysql
is not running and show error
│ 2024-04-12T04:52:11.367223Z 0 [Note] InnoDB: Highest supported file format is Barracuda. │ │ 2024-04-12T04:52:11.578169Z 0 [ERROR] InnoDB: Ignoring the redo log due to missing MLOG_CHECKPOINT between the checkpoint 2392029097 and the end 2392029025. │ │ 2024-04-12T04:52:11.578214Z 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error │ │ 2024-04-12T04:52:12.078809Z 0 [ERROR] Plugin 'InnoDB' init function returned error. │ │ 2024-04-12T04:52:12.078846Z 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. │ │ 2024-04-12T04:52:12.078856Z 0 [ERROR] Failed to initialize builtin plugins. │ │ 2024-04-12T04:52:12.078862Z 0 [ERROR] Aborting
Find pvc and pv volume path
- I do this because mysql pod mount pv as datadir , and datadir bound to pv with nfs-storage (which is pointing NAS volume)
│ apiVersion: v1 │ │ kind: PersistentVolumeClaim ... │ requests: │ │ storage: 20Gi │ │ storageClassName: nfs-client │ │ volumeMode: Filesystem │ │ volumeName: pvc-7f636ab1-0c3c-426e-81aa-884416119d36 --- pv │ nfs: │ │ path: /volume1/xxx-prod-storage/kubeflow-mysql-pv-claim-pvc-7f636ab1-0c3c-426e-81aa-884416119d36
Check mount path in Synology NAS using ssh
bash-4.4# pwd /volume1/xxx-prod-storage/kubeflow-mysql-pv-claim-pvc-7f636ab1-0c3c-426e-81aa-884416119d36 bash-4.4# ls auto.cnf ca-key.pem client-cert.pem ib_buffer_pool ib_logfile0 ibtmp1 mlpipeline performance_schema public_key.pem server-key.pem cachedb ca.pem client-key.pem ibdata1 ib_logfile1 metadb mysql private_key.pem server-cert.pem sys
- move ib_logfile0,1 to backup for data loss
bash-4.4# mv ib_logfile0 ib_log.backup0 bash-4.4# mv ib_logfile1 ib_log.backup1
- Then restart mysql deployment

- Now I see different logs but still log sequence error is occuring
mysql 2024-04-12T05:00:25.976661Z 0 [ERROR] InnoDB: Page [page id: space=0, page number=465] log sequence number 2391995870 is in the future! Current system log sequence number 1981303345. │ │ mysql 2024-04-12T05:00:25.976665Z 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/5.7 │ │ mysql 2024-04-12T05:00:26.142008Z 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" │ │ mysql 2024-04-12T05:00:26.142038Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
Put mysql innodb options by using Configmap
- I want to put innodb options when start mysql, so I’ll create configmaps
my.cnf ( Check innodb_force_recovery options on below https://www.notion.so/kade93/MySQL-ERROR-InnoDB-Ignoring-the-redo-log-2024-6e0c23bd1ee04751b8e7d3a78cb322bd?pvs=4#d8049954c79a491e8f1607b82bad6a30)
[mysqld] innodb_log_checksums = ON innodb_force_recovery = 1 # Be aware to use this options
kubectl create configmap pipeline-mysql-config --from-file=my.cnf=./my.cnf --dry-run=client -oyaml apiVersion: v1 data: my.cnf: |+ [mysqld] innodb_log_checksums = ON innodb_force_recovery = 1 kind: ConfigMap metadata: creationTimestamp: null name: pipeline-mysql-config
kubectl create configmap pipeline-mysql-config --from-file=my.cnf=./my.cnf --dry-run=client -oyaml > mysql-config.yaml
- edit mysql-config.yaml
apiVersion: v1 data: my.cnf: | [mysqld] innodb_log_checksums = ON innodb_force_recovery = 1 kind: ConfigMap metadata: creationTimestamp: null name: pipeline-mysql-config namespace: kubeflow
- edit mysql deployment to mount configvolume
apiVersion: apps/v1 kind: Deployment metadata: annotations: labels: app: mysql application-crd-id: kubeflow-pipelines name: mysql namespace: kubeflow spec: replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app: mysql application-crd-id: kubeflow-pipelines strategy: type: Recreate template: metadata: labels: app: mysql application-crd-id: kubeflow-pipelines spec: containers: - args: - --ignore-db-dir=lost+found - --datadir - /var/lib/mysql env: - name: MYSQL_ALLOW_EMPTY_PASSWORD value: "true" image: gcr.io/ml-pipeline/mysql:5.7.37 imagePullPolicy: IfNotPresent name: mysql ports: - containerPort: 3306 name: mysql protocol: TCP resources: requests: cpu: "1" memory: 1Gi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/lib/mysql name: mysql-persistent-storage - mountPath: /etc/mysql/conf.d/my-custom.cnf subPath: my.cnf name: config-volume dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: mysql serviceAccountName: mysql terminationGracePeriodSeconds: 30 volumes: - name: mysql-persistent-storage persistentVolumeClaim: claimName: mysql-pv-claim - name: config-volume configMap: name: pipeline-mysql-config ~
- Finally run mysql on k8s successed
2024-04-12T05:21:58.556526Z 0 [Note] mysqld: ready for connections. │ │ Version: '5.7.37' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server (GPL) │ │ 2024-04-12T05:21:59.464324Z 0 [ERROR] InnoDB: Database page corruption on disk or a failed file read of page [page id: space=40, page number=3]. You may have to recover from a backup. │ │ 2024-04-12T05:21:59.464365Z 0 [Note] InnoDB: Page dump in ascii and hex (16384 bytes): │ │ len 16384; hex 52263e4200000003ffffffffffffffff000000008e936ea245bf00000000000000000000002800021747800300000000008c00050000000100000000000000000000000000000000004a000000280000000200f200000028000000020032010 │ │ InnoDB: End of page dump │ │ 2024-04-12T05:21:59.505108Z 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 1378238018, calculated checksums for field1: crc32 1378238018/3377886394, innodb 1936845683, none 3735928559, stored │ │ InnoDB: Page may be an update undo log page │ │ InnoDB: Page may be an index page where index id is 74 │ │ 2024-04-12T05:21:59.505126Z 0 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index pag │ │ 2024-04-12T05:22:00.530733Z 0 [Note] InnoDB: Buffer pool(s) load completed at 240412 5:22:00 │ │ 2024-04-12T05:22:18.221555Z 3 [ERROR] InnoDB: innodb_force_recovery is on. We do not allow database modifications by the user. Shut down mysqld and edit my.cnf to set innodb_force_recovery=0 │ │ 2024-04-12T05:22:18.239105Z 3 [Note] Aborted connection 3 to db: 'cachedb' user: 'root' host: '127.0.0.1' (Got an error reading communication packets)