Padavan自己增加frp并排除故障

顺便记录排除frp报错EOF(i/o deadline)的过程

spark1e ·

本文的应用环境是指在使用他人已编译好,且不方便自己重新再进行编译的Padavan路由系统下,如何添加软件。

本文涉及的frp故障排查版本为0.51.2,如果后续更新可能会有不同,仅供参考。

Padavan添加软件

Padavan的系统(或者说目前刷路由器,非软路由,可用的系统里)大部分目录都是重启即丢失的。

但需要注意的是,由于Padavan系统的特殊性,关键目录以外的目录均为 tmpfs ,可以理解为是把数据暂存在内存上。因此在/etc/storage目录里所做的修改,如果没有执行保存脚本,就并没有真正的写入 Rom 里,重启之后文件还会丢失。 —— panFake的文章

那么我们的第一个目标就是把frp添加到/etc/storage目录下,并且烧录,使其重启可用。

路由器还是能插U盘的更好一些

我的是红米AC2100,刷入Padavan后,SSH到路由器可见,可用空间大概也就几M,而最新版本的frp大概需要10M,空间严重不足。如果你的路由器能够插U盘,那么方案就变成了:

  • 把frp放在U盘指定目录
  • 路由器启动时自动挂载U盘
  • 拷贝可执行文件到/tmp目录并启动(或者直接在U盘上启动)

不过上述方案不适合我,接下来介绍我的处理。

首要条件需要开启Padavan的SSH功能,这里不赘述,自行搜索。

UPX,好用

本文没有使用UPX的最新版,个人使用的3.95

经过自己的测试,UPX运行的平台与需要修改的可执行文件平台无关,因此如果压缩有问题可能是你的原本的可执行文件的平台不对。

红米AC2100使用的是联发科的MT7621A,个人测试使用的frp版本应为misple版本。

利用UPX工具,使用--best参数,将frpc从10M压缩到了4M左右,如果你的空间还是不足,可以尝试压缩更旧版本的frpc,我自己压缩的0.39版本可以压缩到2M左右。

压缩好后将frpc和配置文件放在/etc/storage目录下,我自己放在了/etc/storage/frp/下,下文基于此。

放置好后建议先测试一下能否正常连接,因为每次重启后会还原到烧录的版本,如果配置的有问题,还需要重新烧录。

如果连接不上,是打不开frpc的管理面板的,我觉得这一点上frpc可以优化一下,至少可以判断是否启动成功。

如果路由器开启了防火墙,记得放行管理面板的端口和与服务端连接用的端口。不建议直接iptables,在管理面板里找Upnp配置即可。

如果测试没问题可以正常连接你的frps服务端,那么到系统设置里可以准备烧录了,这里借用一张恩山老哥的图片。

微信图片_20230815105635

选择“保存/etc/storage/内容到闪存”的提交,即可。

注意检查保存时的log,如果出现类似于:

Aug 18 02:37:15 Storage save: Save storage files to MTD partition "/dev/mtdblock6"

Aug 18 02:37:26 Storage save: Invalid storage final data size: 4240811

一般是指/etc/storage容量不足

提交完毕,在系统设置 - 脚本,里找到“路由器启动时”(或者类似的字样),增加一行nohup启动脚本:

# 可执行文件更改为你自己的目录
nohup /etc/storage/frp/frpc -c /etc/storage/frp/frpc.ini >/dev/null 2>&1 &

选择输出到/dev/null是因为我在frpc.ini里配置了日志输出目录,建议输出到tmp避免闪存写满。

可以重启路由器测试frpc是否自动启动好并且连接上了。

frp故障排除

然而我在测试时遭遇了frp用户最常见的错误:

2023/08/14 13:53:31 [I] [service.go:296] EOF

EOF这种错误一般出现在读取返回体没有正确读取的情况,可能跟conn等都有关系,只有一个EOF这难排查。

找到frp的源码,在client/service.go下增加log

附上我自己debug用的service.go代码:

// Copyright 2017 fatedier, [email protected]
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package client

import (
	"context"
	"crypto/tls"
	"fmt"
	"io"
	"math/rand"
	"net"
	"runtime"
	"strconv"
	"strings"
	"sync"
	"sync/atomic"
	"time"

	"github.com/fatedier/golib/crypto"
	libdial "github.com/fatedier/golib/net/dial"
	fmux "github.com/hashicorp/yamux"
	quic "github.com/quic-go/quic-go"

	"github.com/fatedier/frp/assets"
	"github.com/fatedier/frp/pkg/auth"
	"github.com/fatedier/frp/pkg/config"
	"github.com/fatedier/frp/pkg/msg"
	"github.com/fatedier/frp/pkg/transport"
	"github.com/fatedier/frp/pkg/util/log"
	utilnet "github.com/fatedier/frp/pkg/util/net"
	"github.com/fatedier/frp/pkg/util/util"
	"github.com/fatedier/frp/pkg/util/version"
	"github.com/fatedier/frp/pkg/util/xlog"
)

func init() {
	crypto.DefaultSalt = "frp"
	// TODO: remove this when we drop support for go1.19
	rand.Seed(time.Now().UnixNano())
}

// Service is a client service.
type Service struct {
	// uniq id got from frps, attach it in loginMsg
	runID string

	// manager control connection with server
	ctl   *Control
	ctlMu sync.RWMutex

	// Sets authentication based on selected method
	authSetter auth.Setter

	cfg         config.ClientCommonConf
	pxyCfgs     map[string]config.ProxyConf
	visitorCfgs map[string]config.VisitorConf
	cfgMu       sync.RWMutex

	// The configuration file used to initialize this client, or an empty
	// string if no configuration file was used.
	cfgFile string

	exit uint32 // 0 means not exit

	// service context
	ctx context.Context
	// call cancel to stop service
	cancel context.CancelFunc
}

func NewService(
	cfg config.ClientCommonConf,
	pxyCfgs map[string]config.ProxyConf,
	visitorCfgs map[string]config.VisitorConf,
	cfgFile string,
) (svr *Service, err error) {
	svr = &Service{
		authSetter:  auth.NewAuthSetter(cfg.ClientConfig),
		cfg:         cfg,
		cfgFile:     cfgFile,
		pxyCfgs:     pxyCfgs,
		visitorCfgs: visitorCfgs,
		ctx:         context.Background(),
		exit:        0,
	}
	return
}

func (svr *Service) GetController() *Control {
	svr.ctlMu.RLock()
	defer svr.ctlMu.RUnlock()
	return svr.ctl
}

func (svr *Service) Run(ctx context.Context) error {
	ctx, cancel := context.WithCancel(ctx)
	svr.ctx = xlog.NewContext(ctx, xlog.New())
	svr.cancel = cancel

	xl := xlog.FromContextSafe(svr.ctx)

	// set custom DNSServer
	if svr.cfg.DNSServer != "" {
		dnsAddr := svr.cfg.DNSServer
		if _, _, err := net.SplitHostPort(dnsAddr); err != nil {
			dnsAddr = net.JoinHostPort(dnsAddr, "53")
		}
		// Change default dns server for frpc
		net.DefaultResolver = &net.Resolver{
			PreferGo: true,
			Dial: func(ctx context.Context, network, address string) (net.Conn, error) {
				return net.Dial("udp", dnsAddr)
			},
		}
	}

	// login to frps
	for {
		conn, cm, err := svr.login()
		if err != nil {
			xl.Warn("login to server failed: %v", err)

			// if login_fail_exit is true, just exit this program
			// otherwise sleep a while and try again to connect to server
			if svr.cfg.LoginFailExit {
				return err
			}
			util.RandomSleep(5*time.Second, 0.9, 1.1)
		} else {
			// login success
			ctl := NewControl(svr.ctx, svr.runID, conn, cm, svr.cfg, svr.pxyCfgs, svr.visitorCfgs, svr.authSetter)
			ctl.Run()
			svr.ctlMu.Lock()
			svr.ctl = ctl
			svr.ctlMu.Unlock()
			break
		}
	}

	go svr.keepControllerWorking()

	if svr.cfg.AdminPort != 0 {
		// Init admin server assets
		assets.Load(svr.cfg.AssetsDir)

		address := net.JoinHostPort(svr.cfg.AdminAddr, strconv.Itoa(svr.cfg.AdminPort))
		err := svr.RunAdminServer(address)
		if err != nil {
			log.Warn("run admin server error: %v", err)
		}
		log.Info("admin server listen on %s:%d", svr.cfg.AdminAddr, svr.cfg.AdminPort)
	}
	<-svr.ctx.Done()
	// service context may not be canceled by svr.Close(), we should call it here to release resources
	if atomic.LoadUint32(&svr.exit) == 0 {
		svr.Close()
	}
	return nil
}

func (svr *Service) keepControllerWorking() {
	xl := xlog.FromContextSafe(svr.ctx)
	maxDelayTime := 20 * time.Second
	delayTime := time.Second

	// if frpc reconnect frps, we need to limit retry times in 1min
	// current retry logic is sleep 0s, 0s, 0s, 1s, 2s, 4s, 8s, ...
	// when exceed 1min, we will reset delay and counts
	cutoffTime := time.Now().Add(time.Minute)
	reconnectDelay := time.Second
	reconnectCounts := 1

	for {
		<-svr.ctl.ClosedDoneCh()
		if atomic.LoadUint32(&svr.exit) != 0 {
			return
		}

		// the first three attempts with a low delay
		if reconnectCounts > 3 {
			util.RandomSleep(reconnectDelay, 0.9, 1.1)
			xl.Info("wait %v to reconnect", reconnectDelay)
			reconnectDelay *= 2
		} else {
			util.RandomSleep(time.Second, 0, 0.5)
		}
		reconnectCounts++

		now := time.Now()
		if now.After(cutoffTime) {
			// reset
			cutoffTime = now.Add(time.Minute)
			reconnectDelay = time.Second
			reconnectCounts = 1
		}

		for {
			if atomic.LoadUint32(&svr.exit) != 0 {
				return
			}

			xl.Info("try to reconnect to server...")
			conn, cm, err := svr.login()
			if err != nil {
				xl.Warn("reconnect to server error: %v, wait %v for another retry", err, delayTime)
				util.RandomSleep(delayTime, 0.9, 1.1)

				delayTime *= 2
				if delayTime > maxDelayTime {
					delayTime = maxDelayTime
				}
				continue
			}
			// reconnect success, init delayTime
			delayTime = time.Second

			ctl := NewControl(svr.ctx, svr.runID, conn, cm, svr.cfg, svr.pxyCfgs, svr.visitorCfgs, svr.authSetter)
			ctl.Run()
			svr.ctlMu.Lock()
			if svr.ctl != nil {
				svr.ctl.Close()
			}
			svr.ctl = ctl
			svr.ctlMu.Unlock()
			break
		}
	}
}

// login creates a connection to frps and registers it self as a client
// conn: control connection
// session: if it's not nil, using tcp mux
func (svr *Service) login() (conn net.Conn, cm *ConnectionManager, err error) {
	xl := xlog.FromContextSafe(svr.ctx)
	cm = NewConnectionManager(svr.ctx, &svr.cfg)

	if err = cm.OpenConnection(); err != nil {
		xl.Info("Throwed from line:249, the origin error msg is: ")
		xl.Info(err.Error())
		return nil, nil, err
	}

	defer func() {
		if err != nil {
			cm.Close()
		}
	}()

	conn, err = cm.Connect()
	if err != nil {
		xl.Info("Throwed from line:261, the origin error msg is: ")
		xl.Info(err.Error())
		return
	}

	loginMsg := &msg.Login{
		Arch:      runtime.GOARCH,
		Os:        runtime.GOOS,
		PoolCount: svr.cfg.PoolCount,
		User:      svr.cfg.User,
		Version:   version.Full(),
		Timestamp: time.Now().Unix(),
		RunID:     svr.runID,
		Metas:     svr.cfg.Metas,
	}

	// Add auth
	if err = svr.authSetter.SetLogin(loginMsg); err != nil {
		xl.Info("Throwed from line:280, the origin error msg is: ")
		xl.Info(err.Error())
		return
	}

	if err = msg.WriteMsg(conn, loginMsg); err != nil {
		xl.Info("Throwed from line:286, the origin error msg is: ")
		xl.Info(err.Error())
		return
	}

	var loginRespMsg msg.LoginResp
	_ = conn.SetReadDeadline(time.Now().Add(60 * time.Second))
	if err = msg.ReadMsgInto(conn, &loginRespMsg); err != nil {
		xl.Info("Throwed from line:294, the origin error msg is: ")
		xl.Info(err.Error())
		return
	}
	_ = conn.SetReadDeadline(time.Time{})

	if loginRespMsg.Error != "" {
		err = fmt.Errorf("%s", loginRespMsg.Error)
		xl.Error("%s", loginRespMsg.Error)
		return
	}

	svr.runID = loginRespMsg.RunID
	xl.ResetPrefixes()
	xl.AppendPrefix(svr.runID)

	xl.Info("login to server success, get run id [%s]", loginRespMsg.RunID)
	return
}

func (svr *Service) ReloadConf(pxyCfgs map[string]config.ProxyConf, visitorCfgs map[string]config.VisitorConf) error {
	svr.cfgMu.Lock()
	svr.pxyCfgs = pxyCfgs
	svr.visitorCfgs = visitorCfgs
	svr.cfgMu.Unlock()

	svr.ctlMu.RLock()
	ctl := svr.ctl
	svr.ctlMu.RUnlock()

	if ctl != nil {
		return svr.ctl.ReloadConf(pxyCfgs, visitorCfgs)
	}
	return nil
}

func (svr *Service) Close() {
	svr.GracefulClose(time.Duration(0))
}

func (svr *Service) GracefulClose(d time.Duration) {
	atomic.StoreUint32(&svr.exit, 1)

	svr.ctlMu.RLock()
	if svr.ctl != nil {
		svr.ctl.GracefulClose(d)
		svr.ctl = nil
	}
	svr.ctlMu.RUnlock()

	if svr.cancel != nil {
		svr.cancel()
	}
}

type ConnectionManager struct {
	ctx context.Context
	cfg *config.ClientCommonConf

	muxSession *fmux.Session
	quicConn   quic.Connection
}

func NewConnectionManager(ctx context.Context, cfg *config.ClientCommonConf) *ConnectionManager {
	return &ConnectionManager{
		ctx: ctx,
		cfg: cfg,
	}
}

func (cm *ConnectionManager) OpenConnection() error {
	xl := xlog.FromContextSafe(cm.ctx)

	// special for quic
	if strings.EqualFold(cm.cfg.Protocol, "quic") {
		var tlsConfig *tls.Config
		var err error
		sn := cm.cfg.TLSServerName
		if sn == "" {
			sn = cm.cfg.ServerAddr
		}
		if cm.cfg.TLSEnable {
			tlsConfig, err = transport.NewClientTLSConfig(
				cm.cfg.TLSCertFile,
				cm.cfg.TLSKeyFile,
				cm.cfg.TLSTrustedCaFile,
				sn)
		} else {
			tlsConfig, err = transport.NewClientTLSConfig("", "", "", sn)
		}
		if err != nil {
			xl.Warn("fail to build tls configuration, err: %v", err)
			return err
		}
		tlsConfig.NextProtos = []string{"frp"}

		conn, err := quic.DialAddr(
			cm.ctx,
			net.JoinHostPort(cm.cfg.ServerAddr, strconv.Itoa(cm.cfg.ServerPort)),
			tlsConfig, &quic.Config{
				MaxIdleTimeout:     time.Duration(cm.cfg.QUICMaxIdleTimeout) * time.Second,
				MaxIncomingStreams: int64(cm.cfg.QUICMaxIncomingStreams),
				KeepAlivePeriod:    time.Duration(cm.cfg.QUICKeepalivePeriod) * time.Second,
			})
		if err != nil {
			return err
		}
		cm.quicConn = conn
		return nil
	}

	if !cm.cfg.TCPMux {
		return nil
	}

	conn, err := cm.realConnect()
	if err != nil {
		return err
	}

	fmuxCfg := fmux.DefaultConfig()
	fmuxCfg.KeepAliveInterval = time.Duration(cm.cfg.TCPMuxKeepaliveInterval) * time.Second
	fmuxCfg.LogOutput = io.Discard
	fmuxCfg.MaxStreamWindowSize = 6 * 1024 * 1024
	session, err := fmux.Client(conn, fmuxCfg)
	if err != nil {
		return err
	}
	cm.muxSession = session
	return nil
}

func (cm *ConnectionManager) Connect() (net.Conn, error) {
	if cm.quicConn != nil {
		stream, err := cm.quicConn.OpenStreamSync(context.Background())
		if err != nil {
			return nil, err
		}
		return utilnet.QuicStreamToNetConn(stream, cm.quicConn), nil
	} else if cm.muxSession != nil {
		stream, err := cm.muxSession.OpenStream()
		if err != nil {
			return nil, err
		}
		return stream, nil
	}

	return cm.realConnect()
}

func (cm *ConnectionManager) realConnect() (net.Conn, error) {
	xl := xlog.FromContextSafe(cm.ctx)
	var tlsConfig *tls.Config
	var err error
	tlsEnable := cm.cfg.TLSEnable
	if cm.cfg.Protocol == "wss" {
		tlsEnable = true
	}
	if tlsEnable {
		sn := cm.cfg.TLSServerName
		if sn == "" {
			sn = cm.cfg.ServerAddr
		}

		tlsConfig, err = transport.NewClientTLSConfig(
			cm.cfg.TLSCertFile,
			cm.cfg.TLSKeyFile,
			cm.cfg.TLSTrustedCaFile,
			sn)
		if err != nil {
			xl.Warn("fail to build tls configuration, err: %v", err)
			return nil, err
		}
	}

	proxyType, addr, auth, err := libdial.ParseProxyURL(cm.cfg.HTTPProxy)
	if err != nil {
		xl.Error("fail to parse proxy url")
		return nil, err
	}
	dialOptions := []libdial.DialOption{}
	protocol := cm.cfg.Protocol
	switch protocol {
	case "websocket":
		protocol = "tcp"
		dialOptions = append(dialOptions, libdial.WithAfterHook(libdial.AfterHook{Hook: utilnet.DialHookWebsocket(protocol, "")}))
		dialOptions = append(dialOptions, libdial.WithAfterHook(libdial.AfterHook{
			Hook: utilnet.DialHookCustomTLSHeadByte(tlsConfig != nil, cm.cfg.DisableCustomTLSFirstByte),
		}))
		dialOptions = append(dialOptions, libdial.WithTLSConfig(tlsConfig))
	case "wss":
		protocol = "tcp"
		dialOptions = append(dialOptions, libdial.WithTLSConfigAndPriority(100, tlsConfig))
		// Make sure that if it is wss, the websocket hook is executed after the tls hook.
		dialOptions = append(dialOptions, libdial.WithAfterHook(libdial.AfterHook{Hook: utilnet.DialHookWebsocket(protocol, tlsConfig.ServerName), Priority: 110}))
	default:
		dialOptions = append(dialOptions, libdial.WithTLSConfig(tlsConfig))
	}

	if cm.cfg.ConnectServerLocalIP != "" {
		dialOptions = append(dialOptions, libdial.WithLocalAddr(cm.cfg.ConnectServerLocalIP))
	}
	dialOptions = append(dialOptions,
		libdial.WithProtocol(protocol),
		libdial.WithTimeout(time.Duration(cm.cfg.DialServerTimeout)*time.Second),
		libdial.WithKeepAlive(time.Duration(cm.cfg.DialServerKeepAlive)*time.Second),
		libdial.WithProxy(proxyType, addr),
		libdial.WithProxyAuth(auth),
	)
	conn, err := libdial.DialContext(
		cm.ctx,
		net.JoinHostPort(cm.cfg.ServerAddr, strconv.Itoa(cm.cfg.ServerPort)),
		dialOptions...,
	)
	return conn, err
}

func (cm *ConnectionManager) Close() error {
	if cm.quicConn != nil {
		_ = cm.quicConn.CloseWithError(0, "")
	}
	if cm.muxSession != nil {
		_ = cm.muxSession.Close()
	}
	return nil
}

修改后编译一个debug版本,再放到路由器内测试。

https://github.com/fatedier/frp/blob/dev/package.sh 即这个项目的编译脚本,直接执行即可得到所需的比如misple版本。

可以看出错误位置在于:

2023/08/14 13:53:31 [I] [service.go:295] Throwed from line:294, the origin error msg is:
2023/08/14 13:53:31 [I] [service.go:296] EOF

考虑到在参考其他人方案改为kcp模式时也会出现i/o deadline问题,这里我修改了读取的时间,从原来的10秒改为60秒:

_ = conn.SetReadDeadline(time.Now().Add(60 * time.Second))
	if err = msg.ReadMsgInto(conn, &loginRespMsg); err != nil {
		xl.Info("Throwed from line:294, the origin error msg is: ")
		xl.Info(err.Error())
		return
	}

修改时间这个方式不一定是最后跑通的关键,建议先往后看。

修改后再编译执行,发现还是EOF,而且跟时间无关,日志仍然是秒报错。

2023/08/14 14:27:41 [I] [root.go:220] start frpc service for config file [./frpc.ini]
2023/08/14 14:27:41 [I] [service.go:295] Throwed from line:294, the origin error msg is:
2023/08/14 14:27:41 [I] [service.go:296] EOF
2023/08/14 14:27:41 [W] [service.go:133] login to server failed: EOF

这个时候我(大概,可能,记不太清了)注意到了有这么一行注释 提到了Mux配置,我立即检查了我的服务端,发现我的服务端配置的tcp_mux选项(文档链接)我设置成了false,而客户端默认没有使用时则为true,这里不匹配导致了conn的连接问题。

都更改为true之后,EOF问题解决,成功连接。

Padavan自己增加frp并排除故障
本文作者
spark1e
发布于
2023-08-15
许可协议
转载或引用本文时请遵守许可协议,注明出处、不得用于商业用途!
评论区 - Powered by Giscus