Pre
## Loss ##
loss = C1 * pixel_loss + C2 * depth_loss
exp4 | Nope 💔
[C1, C2]= [1, 0.25] to make the same numerical value for both losses (i.e. same impact for both)
- replace Last ReLU to Sigmoid + scaling
|
|
Train DT = Train |
Val DT = Val |
lr = 0.001 |
| Audio Model |
Audio |
Image Model |
Image |
Others |
| Net1D |
1D |
ResNet 18 |
2q mask Image |
Batch Size = 4 |
| Decoder |
|
|
|
|
| temporal upconv. |
|
|
|
|
exp5 | Sigmoid is bad
[C1, C2]= [1, 0.25]
- fusion FC layer last ReLU → Sigmoid
|
|
Train DT = Train |
Val DT = Val |
lr = 0.0001 |
| Audio Model |
Audio Input |
Image Model |
Image Input |
Others |
| ResNet 18 |
1D |
ResNet 18 |
2q mask Image |
Batch Size = 4 |
| Not Freeze, None |
|
Freeze, mp3d |
|
|
|
|
|
|
|
exp6 | Sigmoid is bad
[C1, C2]= [1, 0.25]
|
|
Train DT = Train |
Val DT = Val |
lr = 0.0001 |