среда, 8 мая 2024 г.

asm injection stub

Lets check what this stub should do being injected in some linux process via __malloc_hook/__free_hook (btw this implicitly means than you cannot use this dirty hack for processes linked with musl or uClibc - they just don't have those hooks)
  • bcs our stub can be called from two different hooks we should store somewhere via which entry point we was called
  • restore old hooks values
  • call dlopen/dlsym and then target function (and pass it address of injection stub for delayed munmap. No, you can't free those memory directly in your target function - try to guess why)
  • get right old hook and jump to it if it was installed or just return to code called __malloc_hook somewhere in libc

So I collected all parameters to do job in table dtab consisting from 6 pointers

  1. __malloc_hook address
  2. old value of __malloc_hook
  3. __free_hook address
  4. old value of __free_hook
  5. pointer to dlopen
  6. pointer to dlsym
after those table we also has couple of string constants for injected.so full path and function name. Also bcs we must setup 2 entry point I decided to put 1 byte with distance between first and second (to make injection logic more universal) right after dtab. Sounds easy, so lets check how this logic can be implemented on some still living processors (given that RIP alpha, sparc, hp-pa etc)

четверг, 2 мая 2024 г.

yet another linux process injection

As you my know there are two methods

1) using LD_PRELOAD, don`t work if you want to inject into already running (and perhaps even many days) process

2) ptrace. Has the following inherent disadvantages

  • target process can be ptraced by somebody else
  • victim program can detect ptrace
  • you just want to avoid in logs something like ptrace attach of "./a.out"[PID] was attempted by "XXX"

So I developed very rough analogs of famous VirtualAllocEx/VirtualProtectEx + simple hook to hijack execution onto assembly written shellcode to call dlopen/dlsym. Currently only x86_64 supported bcs I am too lazy to rewrite this asm stub

Prerequisites
You must have root privileges and be able to build and load kernel modules. I tested code on kernel 6.8, 5.15 and probably it also can work on 4.x, not sure about more old versions

Lets start lighting the dirty details in reverse order

суббота, 27 апреля 2024 г.

gcc: placing strings in arbitrary section

As you may know gcc always placing string literals in section .rodata. Let's assume what we want to change this outrageous behavior - for example for shellcode literals used in function init_module (contained in section .init.text)

We can start with dumping of gcc RTL - for something like printf("some string") RTL will be symbol_ref <var_decl addr *.LC1> and in .S file this looks like

.section        .rodata
.LC1:
        .string "some string"

 
That unnamed VAR_DECL has attribute DECL_IN_CONSTANT_POOL. Probably it is possible to make gcc plugin to collect such literals referring from functions inside specific section and instead of DECL_IN_CONSTANT_POOL patch them section attribute. However this requires too many labour so lets try something more lazy

Possible solutions is to explicitly set section via gcc __attribute__:

#define RSection __attribute__ ((__section__ (".init.text")))
#define _RN(name) static const char rn_##name[] RSection =
#define _GN(name) rn_##name
...
_RN(dummy_str) "some string";
printf("%s\n", _GN(dummy_str));

Looks very ugly. And even worse - this raises compilation error:

error: ‘rn_dummy_str__’ causes a section type conflict with ‘init_module’
   11 | #define _RN(name) static const char rn_##name##__[] __attribute__ ((section (".init.text"))) =

How we can fix this problem? My first thought was to write quick and dirty Perl script to scan sources for _RN markers and produce .S file where all strings were placed in right section. But then I decided to overcome my laziness and made patch for gcc - it just checks if passed declaration is initialized with STRING_CST value. Surprisingly, it works!

воскресенье, 31 марта 2024 г.

netfilter hooks

They can be used to run shell when received some magic packet: 1 2 3. As usually there is not tool to show installed netfilter hooks so I added dumping them (and at the same time netfilter loggers) to my lkcd
 
Lets check where this hooks live inside kernel. As starting point we can review source of main function for hooks installing nf_register_net_hooks which leads to nf_hook_entry_head. We can notice that there are lots of locations for hooks:
  1. field nf_hooks_ingress in net_dev (when CONFIG_NETFILTER_INGRESS enabled)
  2. on more new kernels also field nf_hooks_egress in net_dev (when CONFIG_NETFILTER_EGRESS enabled)
  3. lots of fields in struct netns_nf:
    • hooks_ipv4
    • hooks_ipv6
    • hooks_arp (CONFIG_NETFILTER_FAMILY_ARP)
    • hooks_bridge (CONFIG_NETFILTER_FAMILY_BRIDGE)
    • hooks_decnet (CONFIG_NETFILTER_FAMILY_DECNET)
    Also on old kernels (before 4.16) there was one array hooks in netns_nf
 
results
lkmem -c -n ../unpacked/101 /boot/System.map-5.15.0-101-generic
...
2 nf hooks:
   [0] type 02 IPV4 idx 0 0xffffffffa7b84dd0 - kernel!apparmor_ipv4_postroute
   [1] type 10 IPV6 idx 0 0xffffffffa7b84e10 - kernel!apparmor_ipv6_postroute

пятница, 8 марта 2024 г.

Profiling shared libraries on linux

Disclaimer: proposed approach uses dirty hacks & patches and tested on x86_64 only so use it at your own risk. Also no chatGPT or some another Artificial Idiots were used for this research

Lets assume that we have shared library (for example R extension or python module) and we want to know where and why it spending many hours and consuming megawatts of electricity. There is even semi-official way to do this:

  1. compile shared library with -pg option
  2. set envvar LD_PROFILE_OUTPUT to directory where you want to store profiling data
  3. set envvar LD_PROFILE to filename of library to profile
  4. run your program. Well, sounds that you need lots of things to do before this step and you can`t set up profiling dynamically
  5. run sprof on profiling log

Unfortunately this method just don`t work - sprof fails with cryptic message
Inconsistency detected by ld.so: dl-open.c: 890: _dl_open: Assertion `_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed!

Seems that this long lived bug known since 2017 and still not fixed
Lets try to discover some more reliable way and start with inspection of code generated for profiling

воскресенье, 14 января 2024 г.

failed attempts to draw graphs

CSES has several really hard graph-related tasks, for example

It would be a good idea to visualize those graphs. One of well-known tool to do this is Graphviz, so I wrote simple perl script to render graph from CSES plain text into their DSL. On small graphs all goes well and we can enjoy with something like

But seems that on big graphs with 200k nodes dot just can`t finish rendering and after ~2 hours of hard work met with OOM killer. Lets think how we can reduce size of graph

четверг, 4 января 2024 г.

Distinct Colors

I`ve solved yet another very funny CSES task - it looks very similar to another task called "Reachable Nodes" (my solution for it). The only difference is that we asked to count not unique nodes but colors of nodes. What can go wrong?

And this is where funny part begins - my patched solution got crashes. gdb didn`t showed nothing interesting. However I remember scary cryptic command to show stack usage:

print (char *)_environ - (char *)$sp
$1 = 8384904

Very close to default 8Mb (check ulimit -s). Wait, WHAT? Do we really have stack exhausting? Lets check - 8 * 1024 * 1024 = 8388608 bytes. Tree can have 200000 nodes. 8388608 / 200000 = ~42 bytes for each recursive DFS call. Seems to be true - in each call we store return address + stack frame RBP + 3 registers holding args (this, indexes of node and parent) - so at least 5 * 8 = 40 bytes. It`s so happened that some tests contain tree with very long stem from root till end, so yes - recursive DFS cannot visit all nodes in such tree. Solution is simple - we can emulate recursion with std::stack. As bonus for all nodes in stack we can use single bit mask to save space

Another unpleasant observation is that trees in tests ain't BINARY trees. When one picture is worth a thousand words:

Degree of node 2 is 4. This is main reason why function dfs has separate branch for processing joint nodes with only 2 descendants - bcs initially method is_fork returned only left and right

Source