Problem - kubeflow pipeline’s mysql is not working after docker restart of worker node
I removed nvidia-device-plugin daemonsets and installed gpu-operator , So I have to restart docker daemon to re-apply docker daemon.json
- I restarted docker on my kubeflow nodes and got
gcr.io/ml-pipeline/mysqlis not running and show error
│ 2024-04-12T04:52:11.367223Z 0 [Note] InnoDB: Highest supported file format is Barracuda. │
│ 2024-04-12T04:52:11.578169Z 0 [ERROR] InnoDB: Ignoring the redo log due to missing MLOG_CHECKPOINT between the checkpoint 2392029097 and the end 2392029025. │
│ 2024-04-12T04:52:11.578214Z 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error │
│ 2024-04-12T04:52:12.078809Z 0 [ERROR] Plugin 'InnoDB' init function returned error. │
│ 2024-04-12T04:52:12.078846Z 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. │
│ 2024-04-12T04:52:12.078856Z 0 [ERROR] Failed to initialize builtin plugins. │
│ 2024-04-12T04:52:12.078862Z 0 [ERROR] Aborting
Find pvc and pv volume path
- I do this because mysql pod mount pv as datadir , and datadir bound to pv with nfs-storage (which is pointing NAS volume)
│ apiVersion: v1 │
│ kind: PersistentVolumeClaim
...
│ requests: │
│ storage: 20Gi │
│ storageClassName: nfs-client │
│ volumeMode: Filesystem │
│ volumeName: pvc-7f636ab1-0c3c-426e-81aa-884416119d36
---
pv
│ nfs: │
│ path: /volume1/xxx-prod-storage/kubeflow-mysql-pv-claim-pvc-7f636ab1-0c3c-426e-81aa-884416119d36
Check mount path in Synology NAS using ssh
bash-4.4# pwd
/volume1/xxx-prod-storage/kubeflow-mysql-pv-claim-pvc-7f636ab1-0c3c-426e-81aa-884416119d36
bash-4.4# ls
auto.cnf ca-key.pem client-cert.pem ib_buffer_pool ib_logfile0 ibtmp1 mlpipeline performance_schema public_key.pem server-key.pem
cachedb ca.pem client-key.pem ibdata1 ib_logfile1 metadb mysql private_key.pem server-cert.pem sys
- move ib_logfile0,1 to backup for data loss
bash-4.4# mv ib_logfile0 ib_log.backup0
bash-4.4# mv ib_logfile1 ib_log.backup1
- Then restart mysql deployment
- Now I see different logs but still log sequence error is occuring
mysql 2024-04-12T05:00:25.976661Z 0 [ERROR] InnoDB: Page [page id: space=0, page number=465] log sequence number 2391995870 is in the future! Current system log sequence number 1981303345. │
│ mysql 2024-04-12T05:00:25.976665Z 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/5.7 │
│ mysql 2024-04-12T05:00:26.142008Z 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" │
│ mysql 2024-04-12T05:00:26.142038Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
Put mysql innodb options by using Configmap
- I want to put innodb options when start mysql, so I’ll create configmaps
my.cnf ( Check innodb_force_recovery options on below https://www.notion.so/kade93/MySQL-ERROR-InnoDB-Ignoring-the-redo-log-2024-6e0c23bd1ee04751b8e7d3a78cb322bd?pvs=4#d8049954c79a491e8f1607b82bad6a30)
[mysqld]
innodb_log_checksums = ON
innodb_force_recovery = 1
# Be aware to use this options
kubectl create configmap pipeline-mysql-config --from-file=my.cnf=./my.cnf --dry-run=client -oyaml
apiVersion: v1
data:
my.cnf: |+
[mysqld]
innodb_log_checksums = ON
innodb_force_recovery = 1
kind: ConfigMap
metadata:
creationTimestamp: null
name: pipeline-mysql-config
kubectl create configmap pipeline-mysql-config --from-file=my.cnf=./my.cnf --dry-run=client -oyaml > mysql-config.yaml
- edit mysql-config.yaml
apiVersion: v1
data:
my.cnf: |
[mysqld]
innodb_log_checksums = ON
innodb_force_recovery = 1
kind: ConfigMap
metadata:
creationTimestamp: null
name: pipeline-mysql-config
namespace: kubeflow
- edit mysql deployment to mount configvolume
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
labels:
app: mysql
application-crd-id: kubeflow-pipelines
name: mysql
namespace: kubeflow
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: mysql
application-crd-id: kubeflow-pipelines
strategy:
type: Recreate
template:
metadata:
labels:
app: mysql
application-crd-id: kubeflow-pipelines
spec:
containers:
- args:
- --ignore-db-dir=lost+found
- --datadir
- /var/lib/mysql
env:
- name: MYSQL_ALLOW_EMPTY_PASSWORD
value: "true"
image: gcr.io/ml-pipeline/mysql:5.7.37
imagePullPolicy: IfNotPresent
name: mysql
ports:
- containerPort: 3306
name: mysql
protocol: TCP
resources:
requests:
cpu: "1"
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/mysql
name: mysql-persistent-storage
- mountPath: /etc/mysql/conf.d/my-custom.cnf
subPath: my.cnf
name: config-volume
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: mysql
serviceAccountName: mysql
terminationGracePeriodSeconds: 30
volumes:
- name: mysql-persistent-storage
persistentVolumeClaim:
claimName: mysql-pv-claim
- name: config-volume
configMap:
name: pipeline-mysql-config
~
- Finally run mysql on k8s successed
2024-04-12T05:21:58.556526Z 0 [Note] mysqld: ready for connections. │
│ Version: '5.7.37' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server (GPL) │
│ 2024-04-12T05:21:59.464324Z 0 [ERROR] InnoDB: Database page corruption on disk or a failed file read of page [page id: space=40, page number=3]. You may have to recover from a backup. │
│ 2024-04-12T05:21:59.464365Z 0 [Note] InnoDB: Page dump in ascii and hex (16384 bytes): │
│ len 16384; hex 52263e4200000003ffffffffffffffff000000008e936ea245bf00000000000000000000002800021747800300000000008c00050000000100000000000000000000000000000000004a000000280000000200f200000028000000020032010 │
│ InnoDB: End of page dump │
│ 2024-04-12T05:21:59.505108Z 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 1378238018, calculated checksums for field1: crc32 1378238018/3377886394, innodb 1936845683, none 3735928559, stored │
│ InnoDB: Page may be an update undo log page │
│ InnoDB: Page may be an index page where index id is 74 │
│ 2024-04-12T05:21:59.505126Z 0 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index pag │
│ 2024-04-12T05:22:00.530733Z 0 [Note] InnoDB: Buffer pool(s) load completed at 240412 5:22:00 │
│ 2024-04-12T05:22:18.221555Z 3 [ERROR] InnoDB: innodb_force_recovery is on. We do not allow database modifications by the user. Shut down mysqld and edit my.cnf to set innodb_force_recovery=0 │
│ 2024-04-12T05:22:18.239105Z 3 [Note] Aborted connection 3 to db: 'cachedb' user: 'root' host: '127.0.0.1' (Got an error reading communication packets)