The best way to learn filesystems is to write one yourself. This post uses FUSE + Go to implement an in-memory filesystem with basic read/write operations, deep-diving into core concepts like inode and block.
1. Why Build Your Own Filesystem?
1.1 Learning Goals
Understanding filesystems is foundational for:
- Container storage volumes
- Distributed storage (Ceph, GlusterFS)
- Object storage (S3 FUSE)
- Database storage engines
1.2 FUSE Advantages
FUSE (Filesystem in Userspace) allows implementing filesystems in userspace:
- No kernel modules needed
- Easy development and debugging
- Can use high-level languages (Go, Python, Rust)
2. Filesystem Core Concepts
2.1 Inode
Inode (Index Node) is the core data structure storing file metadata:
1
2
3
4
5
6
7
8
9
10
11
12
| type Inode struct {
ID uint64 // inode number
Type InodeType // file or directory
Size uint64 // file size
Mode os.FileMode // permissions
Uid, Gid uint32 // owner
Atime time.Time // access time
Mtime time.Time // modification time
Ctime time.Time // creation time
Links uint32 // hard link count
Blocks []uint64 // data block indices
}
|
Key points:
- Filename is NOT in inode! Filenames are in directory entries
- One inode can have multiple filenames (hard links)
ls -i shows inode numbers
2.2 Block
Block is the basic storage unit, typically 4KB:
1
2
3
4
5
| const BlockSize = 4096
type Block struct {
Data [BlockSize]byte
}
|
File contents are split into multiple Blocks.
2.3 Directory
A directory is essentially a file whose contents are a list of “directory entries”:
1
2
3
4
5
6
| type DirEntry struct {
Name string // filename
Inode uint64 // points to inode
}
// Directory content is []DirEntry
|
2.4 Relationship Diagram
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| ┌─────────────────────────────────────────┐
│ Superblock │
│ (FS metadata: block count, inode count) │
└─────────────────────────────────────────┘
│
┌────────────────────────┼────────────────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Inode 1 │ │ Inode 2 │ │ Inode 3 │
│ (root) │ │ (file a)│ │ (file b)│
│ Type=Dir│ │ Type=Reg│ │ Type=Reg│
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Block 0 │ │ Block 1 │ │ Block 2 │
│ DirEntry│ │ content │ │ content │
│ a→Inode2│ │ "hello"│ │ "world" │
│ b→Inode3│ └─────────┘ └─────────┘
└─────────┘
|
3. FUSE Architecture
3.1 How It Works
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| ┌──────────────────────────────────────────────────────────┐
│ User Space │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Your App │───▶│ glibc │───▶│ FUSE userspace │ │
│ │ (cat, ls)│ │ open/read│ │ (your code) │ │
│ └──────────┘ └────┬─────┘ └────────▲─────────┘ │
│ │ │ │
└───────────────────────│────────────────────│──────────────┘
│ VFS │ /dev/fuse
▼ │
┌───────────────────────────────────────────────────────────┐
│ Kernel │
│ ┌──────────────────────────────────────────────────┐ │
│ │ FUSE kernel module │ │
│ │ (forwards VFS calls to userspace, returns result)│ │
│ └──────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────┘
|
3.2 Operations to Implement
| Operation | System Call | Description |
|---|
| Lookup | open(dir) | Find file in directory |
| Getattr | stat | Get file attributes |
| Readdir | readdir | List directory contents |
| Read | read | Read file content |
| Write | write | Write file content |
| Create | creat/open | Create new file |
| Mkdir | mkdir | Create directory |
| Unlink | unlink | Delete file |
| Rmdir | rmdir | Delete directory |
4. Go Implementation
4.1 Dependencies
Using bazil.org/fuse library:
4.2 Data Structures
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
| package main
import (
"os"
"sync"
"time"
)
type InodeType int
const (
TypeFile InodeType = iota
TypeDir
)
type MemFS struct {
mu sync.RWMutex
inodes map[uint64]*Inode
data map[uint64][]byte // inode -> file content
nextIno uint64
}
type Inode struct {
ID uint64
Type InodeType
Mode os.FileMode
Size uint64
Atime time.Time
Mtime time.Time
Children map[string]uint64 // dir entries: name -> inode
}
func NewMemFS() *MemFS {
fs := &MemFS{
inodes: make(map[uint64]*Inode),
data: make(map[uint64][]byte),
nextIno: 2,
}
// Create root directory (inode 1)
fs.inodes[1] = &Inode{
ID: 1,
Type: TypeDir,
Mode: os.ModeDir | 0755,
Atime: time.Now(),
Mtime: time.Now(),
Children: make(map[string]uint64),
}
return fs
}
|
4.3 Implementing FUSE Interface
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
| import (
"bazil.org/fuse"
"bazil.org/fuse/fs"
"context"
)
// Dir implements directory node
type Dir struct {
fs *MemFS
inode *Inode
}
func (d *Dir) Attr(ctx context.Context, attr *fuse.Attr) error {
attr.Inode = d.inode.ID
attr.Mode = d.inode.Mode
attr.Atime = d.inode.Atime
attr.Mtime = d.inode.Mtime
return nil
}
func (d *Dir) Lookup(ctx context.Context, name string) (fs.Node, error) {
d.fs.mu.RLock()
defer d.fs.mu.RUnlock()
childIno, ok := d.inode.Children[name]
if !ok {
return nil, fuse.ENOENT
}
child := d.fs.inodes[childIno]
if child.Type == TypeDir {
return &Dir{fs: d.fs, inode: child}, nil
}
return &File{fs: d.fs, inode: child}, nil
}
func (d *Dir) ReadDirAll(ctx context.Context) ([]fuse.Dirent, error) {
d.fs.mu.RLock()
defer d.fs.mu.RUnlock()
var entries []fuse.Dirent
for name, ino := range d.inode.Children {
child := d.fs.inodes[ino]
var t fuse.DirentType
if child.Type == TypeDir {
t = fuse.DT_Dir
} else {
t = fuse.DT_File
}
entries = append(entries, fuse.Dirent{
Inode: ino,
Name: name,
Type: t,
})
}
return entries, nil
}
func (d *Dir) Create(ctx context.Context, req *fuse.CreateRequest, resp *fuse.CreateResponse) (fs.Node, fs.Handle, error) {
d.fs.mu.Lock()
defer d.fs.mu.Unlock()
ino := d.fs.nextIno
d.fs.nextIno++
now := time.Now()
inode := &Inode{
ID: ino,
Type: TypeFile,
Mode: req.Mode,
Atime: now,
Mtime: now,
}
d.fs.inodes[ino] = inode
d.fs.data[ino] = []byte{}
d.inode.Children[req.Name] = ino
file := &File{fs: d.fs, inode: inode}
return file, file, nil
}
func (d *Dir) Mkdir(ctx context.Context, req *fuse.MkdirRequest) (fs.Node, error) {
d.fs.mu.Lock()
defer d.fs.mu.Unlock()
ino := d.fs.nextIno
d.fs.nextIno++
now := time.Now()
inode := &Inode{
ID: ino,
Type: TypeDir,
Mode: req.Mode | os.ModeDir,
Atime: now,
Mtime: now,
Children: make(map[string]uint64),
}
d.fs.inodes[ino] = inode
d.inode.Children[req.Name] = ino
return &Dir{fs: d.fs, inode: inode}, nil
}
|
4.4 File Operations
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
| // File implements file node
type File struct {
fs *MemFS
inode *Inode
}
func (f *File) Attr(ctx context.Context, attr *fuse.Attr) error {
attr.Inode = f.inode.ID
attr.Mode = f.inode.Mode
attr.Size = f.inode.Size
attr.Atime = f.inode.Atime
attr.Mtime = f.inode.Mtime
return nil
}
func (f *File) Read(ctx context.Context, req *fuse.ReadRequest, resp *fuse.ReadResponse) error {
f.fs.mu.RLock()
defer f.fs.mu.RUnlock()
data := f.fs.data[f.inode.ID]
if req.Offset >= int64(len(data)) {
return nil
}
end := req.Offset + int64(req.Size)
if end > int64(len(data)) {
end = int64(len(data))
}
resp.Data = data[req.Offset:end]
return nil
}
func (f *File) Write(ctx context.Context, req *fuse.WriteRequest, resp *fuse.WriteResponse) error {
f.fs.mu.Lock()
defer f.fs.mu.Unlock()
data := f.fs.data[f.inode.ID]
// Extend file size
newLen := int(req.Offset) + len(req.Data)
if newLen > len(data) {
newData := make([]byte, newLen)
copy(newData, data)
data = newData
}
copy(data[req.Offset:], req.Data)
f.fs.data[f.inode.ID] = data
f.inode.Size = uint64(len(data))
f.inode.Mtime = time.Now()
resp.Size = len(req.Data)
return nil
}
|
4.5 Main Program
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
| func main() {
mountpoint := "/tmp/memfs"
os.MkdirAll(mountpoint, 0755)
c, err := fuse.Mount(mountpoint, fuse.FSName("memfs"), fuse.Subtype("memfs"))
if err != nil {
log.Fatal(err)
}
defer c.Close()
fmt.Printf("Mounted at %s\n", mountpoint)
fmt.Println("Press Ctrl+C to exit")
memfs := NewMemFS()
// Handle signals for graceful unmount
go func() {
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
<-sigChan
fuse.Unmount(mountpoint)
}()
err = fs.Serve(c, &FS{memfs: memfs})
if err != nil {
log.Fatal(err)
}
}
type FS struct {
memfs *MemFS
}
func (f *FS) Root() (fs.Node, error) {
return &Dir{fs: f.memfs, inode: f.memfs.inodes[1]}, nil
}
|
5. Test Run
1
2
3
4
5
6
7
8
9
10
11
12
13
| # Build and run
go build -o memfs
./memfs &
# Test file operations
cd /tmp/memfs
echo "hello world" > test.txt
cat test.txt
mkdir subdir
ls -la
# Unmount
fusermount -u /tmp/memfs
|
6. Advanced Extensions
6.1 Persist to Disk
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| // Serialize inodes and data to disk
func (fs *MemFS) SaveToDisk(path string) error {
f, _ := os.Create(path)
defer f.Close()
return gob.NewEncoder(f).Encode(fs)
}
func LoadFromDisk(path string) (*MemFS, error) {
f, _ := os.Open(path)
defer f.Close()
var fs MemFS
err := gob.NewDecoder(f).Decode(&fs)
return &fs, err
}
|
6.2 Network Filesystem
1
2
3
4
5
6
7
8
9
10
11
| // Forward read/write to remote server
func (f *File) Read(ctx context.Context, req *fuse.ReadRequest, resp *fuse.ReadResponse) error {
// Read from remote via gRPC
data, err := f.client.ReadFile(ctx, &pb.ReadRequest{
Inode: f.inode.ID,
Offset: req.Offset,
Size: int64(req.Size),
})
resp.Data = data.Content
return err
}
|
7. Summary
| Concept | Implementation |
|---|
| Inode | Struct storing metadata |
| Block | byte slice storing data |
| Directory | map[name]inode |
| FUSE | Implement Attr/Read/Write interfaces |
Takeaways:
- Deep understanding of inode and directory entry relationship
- Learned VFS to FUSE call chain
- Foundation for understanding distributed storage
Related Posts