Skip to content

Bug about weight sharing in AutoFormer #232

@variant-star

Description

@variant-star

def sample_weight(weight, sample_in_dim, sample_out_dim):
sample_weight = weight[:, :sample_in_dim]
sample_weight = torch.cat([sample_weight[i:sample_out_dim:3, :] for i in range(3)], dim =0)
return sample_weight

I think, there's something wrong in the way weight sharing is done here. I think this code should be:

    N = weight.size(0) // 3
    sample_weight = torch.cat([sample_weight[i*N:i*N+sample_out_dim//3, :] for i in range(3)], dim=0)

To be more intuitive, I drew a schematic diagram to represent the way 4 and 5 heads SA is shared with Linear.weight.

Snipaste_2024-03-28_22-05-19

Maybe I misunderstood the implementation here, can you help check it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions