sentence level prompt learning fix bugs (!9) · Merge requests · exampasser / nlp

Qumeng Sun requested to merge prompt_and_smart into main Sep 01, 2023
================================================================================
JobID = 4883063
User = bzkurs51, Account = bzkurs51
Partition = grete:shared, Nodelist = ggpu135
================================================================================
Submitting job with sbatch from directory: /home/bzkurs51/qumeng
Home directory: /home/bzkurs51
Working directory: /home/bzkurs51/qumeng
Current node: ggpu135
Python 3.8.17
Collecting environment information...
PyTorch version: 2.0.0+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Rocky Linux 8.7 (Green Obsidian) (x86_64)
GCC version: (GCC) 8.5.0 20210514 (Red Hat 8.5.0-16)
Clang version: Could not collect
CMake version: version 3.20.2
Libc version: glibc-2.28

Python version: 3.8.17 (default, Jul  5 2023, 21:04:15)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-4.18.0-425.19.2.el8_7.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 12.0.140
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB
Nvidia driver version: 530.30.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              128
On-line CPU(s) list: 0-127
Thread(s) per core:  2
Core(s) per socket:  32
Socket(s):           2
NUMA node(s):        8
Vendor ID:           AuthenticAMD
CPU family:          25
Model:               1
Model name:          AMD EPYC 7513 32-Core Processor
Stepping:            1
CPU MHz:             2600.000
CPU max MHz:         3681.6399
CPU min MHz:         1500.0000
BogoMIPS:            5199.50
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            32768K
NUMA node0 CPU(s):   0-7,64-71
NUMA node1 CPU(s):   8-15,72-79
NUMA node2 CPU(s):   16-23,80-87
NUMA node3 CPU(s):   24-31,88-95
NUMA node4 CPU(s):   32-39,96-103
NUMA node5 CPU(s):   40-47,104-111
NUMA node6 CPU(s):   48-55,112-119
NUMA node7 CPU(s):   56-63,120-127
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca

Versions of relevant libraries:
[pip3] numpy==1.24.2
[pip3] torch==2.0.0+cu117
[pip3] torch-tb-profiler==0.4.1
[pip3] torchaudio==2.0.1+cu117
[pip3] torchvision==0.15.1+cu117
[conda] torch                     2.0.0+cu117              pypi_0    pypi
[conda] torchaudio                2.0.1+cu117              pypi_0    pypi
[conda] torchvision               0.15.1+cu117             pypi_0    pypi
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0
Loaded the tokenizer in advance. Time taken: 40.08 seconds
Loaded 8544 train examples from data/ids-sst-train.csv
Loaded 141498 train examples from data/quora-train.csv
Loaded 6040 train examples from data/sts-train.csv
Loaded 1101 train examples from data/ids-sst-dev.csv
Loaded 20212 train examples from data/quora-dev.csv
Loaded 863 train examples from data/sts-dev.csv
Loaded all datasets. Time taken: 1.39 seconds
Loaded multi-task data sentence. Time taken: 0.02 seconds
Loaded all dataloaders. Time taken: 0.00 seconds
Before train acc(sts) :: 0.544
------Phase 3: Supervised Multi-task Training with SST, Quora, STS------
Starting multitask training...
Starting multi-task training...
Training Epoch number 0 begins
train acc => 
Paraphrase detection accuracy: 0.753
Sentiment classification accuracy: 0.260
Semantic Textual Similarity correlation: 0.759
dev acc => 
Paraphrase detection accuracy: 0.751
Sentiment classification accuracy: 0.262
Semantic Textual Similarity correlation: 0.752
Saved the model to finetune-5-3e-05-multitask.pt
Higher (avg)accuracy achieved at step 100: 0.589, saved model to finetune-5-3e-05-multitask.pt
train acc => 
Paraphrase detection accuracy: 0.776
Sentiment classification accuracy: 0.410
Semantic Textual Similarity correlation: 0.844
dev acc => 
Paraphrase detection accuracy: 0.774
Sentiment classification accuracy: 0.407
Semantic Textual Similarity correlation: 0.812
Saved the model to finetune-5-3e-05-multitask.pt
Higher (avg)accuracy achieved at step 200: 0.664, saved model to finetune-5-3e-05-multitask.pt
train acc => 
Paraphrase detection accuracy: 0.778
Sentiment classification accuracy: 0.514
Semantic Textual Similarity correlation: 0.870
dev acc => 
Paraphrase detection accuracy: 0.776
Sentiment classification accuracy: 0.462
Semantic Textual Similarity correlation: 0.818
Saved the model to finetune-5-3e-05-multitask.pt
Higher (avg)accuracy achieved at step 300: 0.685, saved model to finetune-5-3e-05-multitask.pt
Training Epoch number 1 begins
train acc => 
Paraphrase detection accuracy: 0.815
Sentiment classification accuracy: 0.575
Semantic Textual Similarity correlation: 0.894
dev acc => 
Paraphrase detection accuracy: 0.812
Sentiment classification accuracy: 0.484
Semantic Textual Similarity correlation: 0.843
Saved the model to finetune-5-3e-05-multitask.pt
Higher (avg)accuracy achieved at step 100: 0.713, saved model to finetune-5-3e-05-multitask.pt
train acc => 
Paraphrase detection accuracy: 0.819
Sentiment classification accuracy: 0.607
Semantic Textual Similarity correlation: 0.911
dev acc => 
Paraphrase detection accuracy: 0.813
Sentiment classification accuracy: 0.505
Semantic Textual Similarity correlation: 0.848
Saved the model to finetune-5-3e-05-multitask.pt
Higher (avg)accuracy achieved at step 200: 0.722, saved model to finetune-5-3e-05-multitask.pt
train acc => 
Paraphrase detection accuracy: 0.829
Sentiment classification accuracy: 0.635
Semantic Textual Similarity correlation: 0.925
dev acc => 
Paraphrase detection accuracy: 0.819
Sentiment classification accuracy: 0.514
Semantic Textual Similarity correlation: 0.853
Saved the model to finetune-5-3e-05-multitask.pt
Higher (avg)accuracy achieved at step 300: 0.729, saved model to finetune-5-3e-05-multitask.pt
Training Epoch number 2 begins
train acc => 
Paraphrase detection accuracy: 0.799
Sentiment classification accuracy: 0.728
Semantic Textual Similarity correlation: 0.933
dev acc => 
Paraphrase detection accuracy: 0.790
Sentiment classification accuracy: 0.512
Semantic Textual Similarity correlation: 0.856
train acc => 
Paraphrase detection accuracy: 0.835
Sentiment classification accuracy: 0.753
Semantic Textual Similarity correlation: 0.942
dev acc => 
Paraphrase detection accuracy: 0.824
Sentiment classification accuracy: 0.499
Semantic Textual Similarity correlation: 0.859
train acc => 
Paraphrase detection accuracy: 0.843
Sentiment classification accuracy: 0.767
Semantic Textual Similarity correlation: 0.947
dev acc => 
Paraphrase detection accuracy: 0.831
Sentiment classification accuracy: 0.510
Semantic Textual Similarity correlation: 0.853
Saved the model to finetune-5-3e-05-multitask.pt
Higher (avg)accuracy achieved at step 300: 0.731, saved model to finetune-5-3e-05-multitask.pt
Training Epoch number 3 begins
train acc => 
Paraphrase detection accuracy: 0.840
Sentiment classification accuracy: 0.824
Semantic Textual Similarity correlation: 0.947
dev acc => 
Paraphrase detection accuracy: 0.824
Sentiment classification accuracy: 0.526
Semantic Textual Similarity correlation: 0.856
Saved the model to finetune-5-3e-05-multitask.pt
Higher (avg)accuracy achieved at step 100: 0.735, saved model to finetune-5-3e-05-multitask.pt
train acc => 
Paraphrase detection accuracy: 0.844
Sentiment classification accuracy: 0.834
Semantic Textual Similarity correlation: 0.956
dev acc => 
Paraphrase detection accuracy: 0.829
Sentiment classification accuracy: 0.537
Semantic Textual Similarity correlation: 0.862
Saved the model to finetune-5-3e-05-multitask.pt
Higher (avg)accuracy achieved at step 200: 0.742, saved model to finetune-5-3e-05-multitask.pt
train acc => 
Paraphrase detection accuracy: 0.849
Sentiment classification accuracy: 0.859
Semantic Textual Similarity correlation: 0.963
dev acc => 
Paraphrase detection accuracy: 0.834
Sentiment classification accuracy: 0.548
Semantic Textual Similarity correlation: 0.863
Saved the model to finetune-5-3e-05-multitask.pt
Higher (avg)accuracy achieved at step 300: 0.748, saved model to finetune-5-3e-05-multitask.pt
Training Epoch number 4 begins
train acc => 
Paraphrase detection accuracy: 0.855
Sentiment classification accuracy: 0.866
Semantic Textual Similarity correlation: 0.957
dev acc => 
Paraphrase detection accuracy: 0.835
Sentiment classification accuracy: 0.514
Semantic Textual Similarity correlation: 0.850
train acc => 
Paraphrase detection accuracy: 0.855
Sentiment classification accuracy: 0.900
Semantic Textual Similarity correlation: 0.967
dev acc => 
Paraphrase detection accuracy: 0.838
Sentiment classification accuracy: 0.512
Semantic Textual Similarity correlation: 0.855
train acc => 
Paraphrase detection accuracy: 0.859
Sentiment classification accuracy: 0.909
Semantic Textual Similarity correlation: 0.969
dev acc => 
Paraphrase detection accuracy: 0.841
Sentiment classification accuracy: 0.506
Semantic Textual Similarity correlation: 0.854
Accuracy didn't improve for 3 consecutive evaluations, early stopping!
Epoch 4: train loss :: 0.000, train acc :: 0.912, dev acc :: 0.734
Time taken: 5180.34 seconds
Loaded model to test from finetune-5-3e-05-multitask.pt
Loaded 2210 test examples from data/ids-sst-test-student.csv
Loaded 40429 test examples from data/quora-test-student.csv
Loaded 1379 test examples from data/sts-test-student.csv
Loaded 1101 dev examples from data/ids-sst-dev.csv
Loaded 20212 dev examples from data/quora-dev.csv
Loaded 863 dev examples from data/sts-dev.csv
Paraphrase detection accuracy: 0.834
Sentiment classification accuracy: 0.548
Semantic Textual Similarity correlation: 0.863
dev sentiment acc :: 0.548
dev paraphrase acc :: 0.834
dev sts corr :: 0.863
============ Job Information ===================================================
Submitted: 2023-09-01T05:55:55
Started: 2023-09-01T05:55:56
Ended: 2023-09-01T07:25:52
Elapsed: 90 min, Limit: 900 min, Difference: 810 min
CPUs: 8, Nodes: 1
Estimated NPL: ================================================================================
Edited Sep 01, 2023 by Qumeng Sun
sentence level prompt learning fix bugs

Merge request reports