Federated Learning (FL) is widely adopted in traffic forecasting tasks involving large-scale IoT-enabled sensor data since its decentralization nature enables data providers' privacy to be preserved. When employing state-of-the-art deep learning-based traffic predictors in FL systems, the existing FL frameworks confront overlarge communication overhead when transmitting these models’ parameter updates since the modelling depth and breadth renders them incorporating enormous number of parameters. In this paper, we propose a practical FL scheme, namely, Clustering-based hierarchical and Two-step-optimized FL (CTFed), to tackle this issue. The proposed scheme follows a divide et impera strategy that clusters the clients into multiple groups based on the similarity between their local models’ parameters. We integrate the particle swarm optimization algorithm and devises a two-step approach for local model optimization. This scheme enables only one but representative local model update from each cluster to be uploaded to the central server, thus reduces the communication overhead of the model updates transmission in FL. CTFed is orthogonal to the gradient compression- or sparsification-based approaches so that they can orchestrate to optimize the communication overhead. Extensive case studies on three real-world datasets and three state-of-the-art models demonstrate the outstanding training efficiency, accurate prediction performance and robustness to unstable network environments of the proposed scheme.